Ximing Lu (@GXiming)

Pinned

X

Ximing Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

GXiming's tweet image. With the rise of R1, search seems out of fashion? We prove the opposite! 😎

Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

6

84

252

175

70.0K

Ximing Lu Retweeted

S

Shizhe Diao@shizhediao · Jul 23

Meanwhile, ProRL v2 is coming soon — building on our idea of prolonged reinforcement learning (ProRL), we continue to scale up training steps with stable reinforcement learning. The current results look very promising. Stay tuned!

0

1

3

0

365

X

Ximing Lu@GXiming · Jul 23

New tech report out! 🚀 Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training An expanded version of our ProRL paper — now with more training insights and experimental details. Read it here 👉 arxiv.org/abs/2507.12507

SShizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

2

14

109

78

9.0K

Ximing Lu Retweeted

Y

Yong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

7

82

247

117

56.0K

Ximing Lu Retweeted

W

Wooseok Seo@just1nseo · Jun 17

🚀New Paper! arxiv.org/abs/2506.13342 While fact verification is essential to ensure the reliability of LLMs, detailed analysis of fact verifiers remains understudied. We present several findings based on our revised dataset, along with practical guidance to improve the models.

1

23

99

65

12.0K

X

Ximing Lu@GXiming · Jun 5

🚨New Paper Alert🚨 Excited to share our new video game benchmark, "Orak"! 🕹️ It was a thrilling experience to test whether LLM/VLM agents can solve real video games 🎮 Looking forward to continuing my research on LLM/VLM-based game agents with @Krafton_AI !

KKangwook Lee@Kangwook_Lee · Jun 5

As a video gaming company, @Krafton_AI has secretly been cooking something big with @NVIDIAAI for a while! 🥳 We introduce Orak, the first comprehensive video gaming benchmark for LLMs! arxiv.org/abs/2506.03610

1

5

20

3

2.0K

X

Ximing Lu@GXiming · Jun 3

What kind of data diversity helps reasoning models to generalize better — and how can we get more of it 🧐? 👇 Read on to see what we found! ✨

JJaehun Jung@jaehunjung_com · Jun 2

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

0

1

7

2

944

X

Ximing Lu@GXiming · Jun 2

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.

❄❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

2

16

127

42

14.0K

Ximing Lu Retweeted

A

AK@_akhaliq · Jun 2

Nvidia presents ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

5

43

278

135

21.0K

X

Ximing Lu@GXiming · Jun 2

if no one else is showing that RL isn't just eliciting latent behavior already learned in pretraining, but is actually a new scaling paradigm, nvidia has to do it themselves

❄❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

4

14

151

54

11.0K

Ximing Lu Retweeted

J

Jaehun Jung@jaehunjung_com · Jun 2

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

6

33

182

133

19.0K

X

Ximing Lu@GXiming · Jun 2

And this on a 1.5b model :), 136k problems. rl scaling makes us happy

❄❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

13

36

464

215

84.0K

Ximing Lu Retweeted

❄

❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

16

118

784

679

161.0K

Ximing Lu Retweeted

S

Shizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

19

66

407

367

54.0K

X

Ximing Lu@GXiming · May 4

Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning Author's Explanation: x.com/GXiming/status… While distilling reasoning paths from large models can boost smaller models, these paths are often inefficient. Retro-Search is a new algorithm designed to…

XXiming Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

1

2

12

7

4.0K

X

Ximing Lu@GXiming · Apr 23

Humans backtrack where we should've made a better decision. How do we do this? We search and simulate alternative paths that might have led to better outcomes. Our🌈RETRO-Search mimics this process, empowering models to achieve SOTA performance AND efficient reasoning in math🌟

XXiming Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

0

4

33

7

4.0K

X

Ximing Lu@GXiming · Apr 23

What if longer reasoning isn’t always better, and blind shortening doesn’t always work? In our latest work, we use search as an effective means to reduce both overthinking and underthinking, synthesizing reasoning trajectories that are efficient and insightful. Check it out! 👇

XXiming Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

0

3

9

1

1.0K

X

Ximing Lu@GXiming · Apr 22

Finetuning on raw DeepSeek R1 reasoning traces makes models overthink. One of our early s1 versions was overthinking so much, it questioned the purpose of math when just asking what's 1+1😁 Retro-Search by @GXiming & team reduces overthinking + improves performance!

XXiming Lu@GXiming · Apr 22

With the rise of R1, search seems out of fashion? We prove the opposite! 😎 Introducing Retro-Search 🌈: an MCTS-inspired search algorithm that RETROspectively revises R1’s reasoning traces to synthesize untaken, new reasoning paths that are better 💡, yet shorter in length ⚡️.

1

7

99

49

12.0K