Yuqing Yang

@yyqcode

First-year PhD student @CSatUSC @nlp_usc.

Joined June 2023

364Following

228Followers

Pinned

Yuqing Yang@yyqcode · May 29

🧐When do LLMs admit their mistakes when they should know better? In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong. LLMs can retract—but they rarely do.🤯 arxiv.org/abs/2505.16170 👇🧵

yyqcode's tweet image. 🧐When do LLMs admit their mistakes when they should know better?

In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong.
LLMs can retract—but they rarely do.🤯

arxiv.org/abs/2505.16170

👇🧵

112

14.0K

Pinned

Yuqing Yang Retweeted

Tianyi Zhou@tianyi_zhou12 · Feb 18

Billion-parameter LLMs still struggle with simple arithmetic? 📞 FoNE (Fourier Number Embedding) tackles this problem. By mapping numbers directly into Fourier space, it bypasses tokenization and significantly improves numerical accuracy with better efficiency and accuracy.

3.0K

Yuqing Yang Retweeted

Chenxin An@AnChancy46881 · Jun 20

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…

447

399

96.0K

Yuqing Yang@yyqcode · Jun 19

There’s been hot debate about (The Illusion of) The Illusion of Thinking. My take: it’s not that models can’t reason — they just aren’t perfect at long-form generation yet. We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoning…

XXi Ye@xiye_nlp · Jan 14

🤔Now most LLMs have >= 128K context sizes, but are they good at generating long outputs, such as writing 8K token chain-of-thought for a planning problem？ 🔔Introducing LongProc (Long Procedural Generation), a new benchmark with 6 diverse tasks that challenge LLMs to synthesize…

4.0K

Yuqing Yang Retweeted

Dongwei Jiang@Dongwei__Jiang · Jun 16

🧵 Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it? We tested this systematically—and found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.

155

11.0K

Yuqing Yang Retweeted

Linxin Song@linxins2 · May 21

🚨 We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions. We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations. 🧵 1/n

257

249

28.0K

Yuqing Yang Retweeted

Deqing Fu@DeqingFu · May 21

Textual steering vectors can improve visual understanding in multimodal LLMs! You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs. And They Steer!

7.0K

Yuqing Yang Retweeted

Linxin Song@linxins2 · Apr 1

Want to know what your LLM don’t know? This is how 👇 Preprint: arxiv.org/abs/2503.23361 Code: github.com/uscnlp-lime/SEA

21.0K

Yuqing Yang Retweeted

Muru Zhang@zhang_muru · Feb 4

Running your model on multiple GPUs but often found the speed not satisfiable? We introduce Ladder-residual, a parallelism-aware architecture modification that makes 70B Llama with tensor parallelism ~30% faster! Work done at @togethercompute. Co-1st author with @MayankMish98…

323

196

76.0K

Yuqing Yang@yyqcode · Dec 12

Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning! 🗓Dec 12, Thursday, 11:00 am - 2:00 pm 📍East Exhibit Hall A-C #2900 Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!

TTengxiao Liu@TengxiaoLiu · Dec 4

🤔Can LMs learn to skip steps to improve reasoning efficiency while maintaining accuracy? ✅The answer is Yes! In our #NeurIPS 2024 work, we show this behavior boosts efficiency, maintains accuracy, and even enhances generalization in OOD scenarios! 🚀arxiv.org/pdf/2411.01855 🧵⬇️

938

Yuqing Yang@yyqcode · Dec 10

I'll present a poster for Lifelong ICL and Task Haystack at #NeurIPS2024! ⏰ Wednesday 11am-2pm 📍 East Exhibit Hall A-C #2802 📜 arxiv.org/abs/2407.16695 My co-first author @xiaoyue02_xu is applying to PhD programs and I am looking jobs in industry! Happy to connect at NeurIPS!

QQinyuan Ye@qinyuan_ye · Aug 2

Introducing 𝗟𝗶𝗳𝗲𝗹𝗼𝗻𝗴 𝗜𝗖𝗟 and 𝗧𝗮𝘀𝗸 𝗛𝗮𝘆𝘀𝘁𝗮𝗰𝗸, a new approach for evaluating long-context LMs, featuring ever-changing task streams that controllably fill the context window, and NIAH-style visualization for easy diagnosis. 📜 arxiv.org/abs/2407.16695 🧵

4.0K