Gaotang Li

@GaotangLi

First-Year Ph.D. @UofIllinois | Undergrad @UMich. Science of Language Models. Reasoning. Alignment.

Joined November 2024

167Following

76Followers

Pinned

Gaotang Li Retweeted

Rohan Paul@rohanpaul_ai · Jun 17

Tool-calling turns GPT-4.1 into a near-o1-preview without a single gradient step. No retraining, just smarter prompts for near-RL performance. 🤯 pass@1 performance on AIME2024 from 26.7% to 43.3%, bringing it very close to the performance of o1-preview. Swapping one prompt…

399

458

43.0K

Gaotang Li@GaotangLi · Jul 26

It’s exciting to see the application of rubric-based RM in non-verifiable domains!

AAnisha Gunjal@anisha_gunjal · Jul 24

🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer? Our work at @scale_AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵

185

Gaotang Li Retweeted

finbarr@finbarrtimbers · Jul 4

horrifying bug of the day is finding out the vllm and huggingface produce significantly different logprobs discuss.vllm.ai/t/numerical-di…

784

475

223.0K

Gaotang Li Retweeted

Tyler Griggs@tyler_griggs_ · Jul 1

We did a deep-dive on the (many) open source RL frameworks out there, and tried to distill their core design philosophies and supported features. If you're trying to decide which framework to use for your next RL run, this might help: anyscale.com/blog/open-sour…

196

225

22.0K

Gaotang Li Retweeted

Jason Weston@jaseweston · Jul 3

🌿Introducing NaturalThoughts 🌿 arxiv.org/abs/2507.01921 🎯 Data curation for general reasoning capabilities is still relatively underexplored. - We systematically compare different metrics for selecting high-quality and diverse reasoning traces in terms of data efficiency in…

413

325

60.0K

Gaotang Li Retweeted

Bo Liu (Benjamin Liu)@Benjamin_eecs · Jul 1

We've always been excited about self-play unlocking continuously improving agents. Our insight: RL selects generalizable CoT patterns from pretrained LLMs. Games provide perfect testing grounds with cheap, verifiable rewards. Self-play automatically discovers and reinforces…

268

179

63.0K

Gaotang Li Retweeted

Xiang Yue@xiangyue96 · Jul 2

People are racing to push math reasoning performance in #LLMs—but have we really asked why? The common assumption is that improving math reasoning should transfer to broader capabilities in other domains. But is that actually true? In our study (arxiv.org/pdf/2507.00432), we…

125

610

399

59.0K

Gaotang Li@GaotangLi · Jun 30

What happens behind the "abrupt learning" curve in Transformer training? Our new work (led by @GopalaniPulkit) reveals universal characteristics of Transformers' early-phase training dynamics—uncovering the implicit biases and the degenerate state the model gets stuck in. ⬇️

PPulkit Gopalani@GopalaniPulkit · Jun 19

Excited to announce our recent work on understanding training-time emergence in Transformers! Thread🧵(1/11)

3.0K

Gaotang Li Retweeted

Zengzhi Wang@SinclairWang1 · Jun 26

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

507

476

89.0K

Gaotang Li Retweeted

Sean Hendryx@SeanHendryx · Jun 23

What will the learning environments of the future look like that train artificial super intelligence? In recent work at @scale_AI , we show that training systems that combine verifiable rewards with multi-agent interaction accelerate learning.

129

100

22.0K

Gaotang Li Retweeted

Yingheng Wang@yingheng_wang · Jun 23

❓ Are LLMs actually problem solvers or just good at regurgitating facts? 🚨New Benchmark Alert! We built HeuriGym to benchmark if LLMs can craft real heuristics for real-world hard combinatorial optimization problems. 🛞 We’re open-sourcing it all: ✅ 9 problems ✅ Iterative…

129

16.0K

Gaotang Li@GaotangLi · Jun 20

Thrilled to share our new reasoning model, Polaris✨! The 4B version achieves a score of 79.4 on AIME 2025, surpassing Claude 4 Opus (75.5) We’re releasing the full RL recipe, data, and weights 🔓 — see all the details below

CChenxin An@AnChancy46881 · Jun 20

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels…

3.0K

Gaotang Li Retweeted

Tanishq Abraham back from ICML@iScienceLuvr · Jun 18

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs "The Pass@K metric itself is a flawed measure of reasoning, as it credits correct final answers that probably arise from inaccurate or incomplete chains of thought (CoTs). To…

257

194

22.0K

Gaotang Li Retweeted

Zirui Liu@ziruirayliu · Jun 12

🔥Exited to share our new work on reproducibility challenges in reasoning models caused by numerical precision. Ever run the same prompt twice and get completely different answers from your LLM under greedy decoding? You're not alone. Most LLMs today default to BF16 precision,…

13.0K

Gaotang Li@GaotangLi · Jun 13

Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄

SStella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

189

129

43.0K

Gaotang Li Retweeted

Wenhu Chen@WenhuChen · Jun 5

🚨 New Paper Alert 🚨 We found that Supervised Fine-tuning on ONE problem can achieve similar performance gain as RL on ONE problem with 20x less compute! Paper: arxiv.org/abs/2506.03295 Recently, people have shown that RL can work even with ONE example. This indicates that the…

321

258

41.0K

Gaotang Li Retweeted

Xiusi Chen@xiusi_chen · Jun 4

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making…

5.0K

Gaotang Li Retweeted

Siru Ouyang@Siru_Ouyang · Jun 2

🚀 Introducing RAST: Reasoning Activation via Small Model Transfer! ✨ RAST adjusts key "reasoning tokens" at decoding time using insights from smaller RL-tuned models — no full RL tuning for large models! ⚡ Efficient & Performant,🧠 Scalable & Easy,📉 Up to 50% less GPU memory!

100

14.0K