Shizhe Diao

@shizhediao

Research Scientist @NVIDIA focusing on efficient post-training of LLMs. Finetuning your own LLMs with LMFlow: http://go.uic.edu/shizhe Views are my own.

Santa Clara, CA

Joined January 2017

2KFollowing

4KFollowers

Pinned

Shizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

shizhediao's tweet image. Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough!

Introducing ProRL 😎, a novel training recipe that scales RL to &gt;2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

408

368

54.0K

Pinned

Shizhe Diao@shizhediao · Jun 5

Pass@1024 results of our RL model (AceReason-Nemotron-7B) and its starting SFT model (DeepSeek-R1-Distill-Qwen-7B) on LiveCodeBench-v6, which features a large answer space and high-quality test cases that are difficult to solve through 'guessing', even with extensive sampling.…

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

6.0K

Shizhe Diao@shizhediao · Jul 22

Empowering the model to independently manage the full task is fundamental to unlocking its full potential.

YYuchen Jin@Yuchenj_UW · Jul 20

Heard GPT-5 is imminent, from a little bird. - It’s not one model, but multiple models. It has a router that switches between reasoning, non-reasoning, and tool-using models. - That’s why Sam said they’d “fix model naming”: prompts will just auto-route to the right model. -…

674

Shizhe Diao Retweeted

Zhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

110

11.0K

Shizhe Diao@shizhediao · Jul 15

Huge leap for open-source theorem proving. Goedel-Prover V2 matches 671B models with just 8B...

YYong Lin@Yong18850571 · Jul 15

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B…

543

Shizhe Diao Retweeted

Sukjun (June) Hwang@sukjun_hwang · Jul 11

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

700

5.0K

4.0K

698.0K

Shizhe Diao@shizhediao · Jun 29

🚨 NVIDIA is launching the Data Filtering Challenge for training edge language models! We believe edge LMs are the future — lightweight, powerful, and ready for real-world tasks like: 🧠 Reasoning 🗣️ Roleplay 🔍 RAG 🔧 Function calling Time to push dataset filtering to the…

shizhediao's tweet image. 🚨 NVIDIA is launching the Data Filtering Challenge for training edge language models!

We believe edge LMs are the future — lightweight, powerful, and ready for real-world tasks like:
🧠 Reasoning
🗣️ Roleplay
🔍 RAG
🔧 Function calling

Time to push dataset filtering to the…

3.0K

Shizhe Diao@shizhediao · Jun 26

Amazing

AAll Hands AI@allhands_ai · Jun 25

It finally happened 😭 After 8 months of hard work, the OpenHands agent surpassed the last human developer on our repository, @xingyaow_. Fellow humans, we had a good run.

672

Shizhe Diao Retweeted

NVIDIA AI Developer@NVIDIAAIDev · Jun 9

Compressing LLMs but worried about accuracy? 🎯 New from #NVIDIAResearch: EoRA uses eigenspace low-rank approximation to compensate for errors—no retraining needed. A promising direction for scalable, task-adaptive LLMs. 🔗 nvda.ws/448xkvZ

3.0K

Shizhe Diao@shizhediao · Jun 17

Impressive work! The parallel generation capability of Multiverse looks amazing!

IInfini-AI-Lab@InfiniAILab · Jun 16

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n

618

Shizhe Diao Retweeted

Yangyi Chen (on job market)@YangyiChen6666 · Jun 15

🚀 I'm looking for full-time research scientist jobs on foundation models! I study pre-training and post-training of foundation models, and LLM-based coding agents. The figure highlights my research/publications. Please DM me if there is any good fit! Highly appreciated!

128

17.0K

Shizhe Diao Retweeted

Songlin Yang@SonglinYang4 · Jun 11

Flash Linear Attention (github.com/fla-org/flash-…) will no longer maintain support for the RWKV series (existing code will remain available). Here’s why:

793

325

85.0K

Shizhe Diao@shizhediao · Jun 2

It does not saturate yet. At NVIDIA, we present "prolonged RL" where we significantly scale up RL training steps (+2k) and problems (+130k). The improvement from RL scaling is surprising and exciting. The RL-ed model makes great progress on some problems that the base model…

NNathan Lambert@natolambert · Jun 2

And this on a 1.5b model :), 136k problems. rl scaling makes us happy

442

229

59.0K

Shizhe Diao Retweeted

Ludwig Schmidt@lschmidt3 · Jun 5

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

212

1.0K

875

167.0K

Shizhe Diao@shizhediao · Jun 2

What happens when you ✨scale up RL✨? In our new work, Prolonged RL, we significantly scale RL training to >2k steps and >130k problems—and observe exciting, non-saturating gains as we spend more compute 🚀.

❄❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

127

14.0K

Shizhe Diao Retweeted

Zafir Stojanovski@zafstojano · Jun 2

Super excited to share 💪🧠Reasoning Gym! 🧵 We provide over 100 data generators and verifiers spanning several domains (algebra, arithmetic, code, geometry, logic, games) for training the next generation of reasoning models. In essence, we can generate an infinite amount of…

142

105

10.0K

Shizhe Diao@shizhediao · Jun 2

Ah very timely paper that validates my current intuition: RL should be scaled beyond 1k steps and for this you need to scale simultaneously the group size (here up to 256) and expand the search space.

❄❄️Andrew Zhao❄️@_AndrewZhao · Jun 2

RL scaling is here arxiv.org/pdf/2505.24864

151

10.0K

Shizhe Diao Retweeted

AK@_akhaliq · Jun 2

Nvidia presents ProRL Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

278

135

21.0K