Lifan Yuan

@lifan__yuan

PhD student @UofIllinois @uiuc_nlp @GoogleDeepMind. Prev: @TsinghuaNLP

Joined August 2017

135Following

2KFollowers

Pinned

Lifan Yuan@lifan__yuan · Jan 2

How to unlock advanced reasoning via scalable RL? 🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data. We're still scaling up - w/ 3x more training data to go! 🧵

lifan__yuan's tweet image. How to unlock advanced reasoning via scalable RL?

🚀Introducing PRIME (Process Reinforcement through Implicit Rewards) and Eurus-2, trained from Base model to surpass Qwen2.5-Math-Instruct using only 1/10 of the data.

We're still scaling up - w/ 3x more training data to go! 🧵

174

1.0K

237.0K

Pinned

Lifan Yuan Retweeted

Yiping Wang@ypwang61 · May 27

We update an analysis about format correction in 1-shot RLVR which is asked frequently (thx for all feedback!) Summary: (1) Format correction indeed contributes a lot in RLVR (e.g., 18% -> 29% for Qwen2.5-Math-1.5B over 6 math tasks), in both full-set (1.2k data) and one-shot…

16.0K

Lifan Yuan@lifan__yuan · Jul 10

With Grok-4, RL is the new pre-training

SShengyang Sun@ssydasheng · Jul 10

We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am…

736

102

59.0K

Lifan Yuan Retweeted

Muhammad Khalifa@MKhalifaaaa · Jun 17

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 @COLM_conf is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics…

4.0K

Lifan Yuan Retweeted

Yangyi Chen (on job market)@YangyiChen6666 · Jun 15

🚀 I'm looking for full-time research scientist jobs on foundation models! I study pre-training and post-training of foundation models, and LLM-based coding agents. The figure highlights my research/publications. Please DM me if there is any good fit! Highly appreciated!

127

17.0K

Lifan Yuan Retweeted

Lifan Yuan@lifan__yuan · May 29

It shares a similar spirit to scaling law, where we can use early runs to fit a and b, then predict the final perf (H=0, R = -a+b). Also, it's base models that determine the ceiling, not algos, which have the same efficiency in consuming entropy, indicated by the similar a.

2.0K

Lifan Yuan Retweeted

Stella Li ➡️ CogSci2025@StellaLisy · May 27

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

337

2.0K

1.0K

682.0K

Lifan Yuan Retweeted

Shivam Agarwal@Shivamag12 · May 27

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮 At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update

409

442

44.0K