Yang Chen (@ychenNLP)

Pinned

Y

Yang Chen@ychenNLP · Jun 17

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench…

ychenNLP's tweet image. 📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models.

The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks.

✅AIME2025 (math): 53.6% -&gt; 64.8%
✅LiveCodeBench…

6

44

205

159

17.0K

Pinned

Y

Yang Chen@ychenNLP · Jun 17

With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: arxiv.org/pdf/2506.13284 🤗Model: huggingface.co/nvidia/AceReas… 📚SFT Data: huggingface.co/datasets/nvidi…

WWei Ping@_weiping · Jun 17

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…

1

8

25

9

2.0K

Pinned

Yang Chen Retweeted

W

Wei Ping@_weiping · Jun 17

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…

1

16

68

32

6.0K

Pinned

Y

Yang Chen@ychenNLP · Jun 5

Does RL incentive reasoning capability over the starting SFT model? We show an interesting result with our recent published AceReason-Nemotron-7B model, which was trained with RL pass@K from 1 to 1024 consistently +10% on LiveCodeBench v6 perhaps scaling RL is the key

WWei Ping@_weiping · Jun 5

Pass@1024 results of our RL model (AceReason-Nemotron-7B) and its starting SFT model (DeepSeek-R1-Distill-Qwen-7B) on LiveCodeBench-v6, which features a large answer space and high-quality test cases that are difficult to solve through 'guessing', even with extensive sampling.…

0

3

17

9

1.0K

Pinned

Y

Yang Chen@ychenNLP · May 23

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

0

9

49

19

5.0K

Pinned

Yang Chen Retweeted

W

Wei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

2

24

151

82

52.0K

Pinned

Y

Yang Chen@ychenNLP · Apr 24

Introducing AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning from DeepSeek-R1-Distilled-Qwen-7B. It achieves AIME24: 69.0%, AIME25: 53.6%, and GPQA: 52.1%. Interestingly, this math-focused RL training also improves the coding…

WWei Ping@_weiping · Apr 24

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after…

0

4

11

2

1.0K

Pinned

Y

Yang Chen@ychenNLP · Apr 24

Had a lot of fun to scale up RL to improve math reasoning! Excited to introduce AceMath-RL-Nemotron-7B with a scalable training recipe 📑Full blog: research.nvidia.com/labs/adlr/acem… 🔗Model: huggingface.co/nvidia/AceMath…

WWei Ping@_weiping · Apr 24

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after…

0

7

25

10

2.0K

Yang Chen Retweeted

E

Elon Musk@elonmusk · Jul 22

Cable pr0n of @xAI GB200 servers at Colossus 2

9.0K

13.0K

154.0K

12.0K

28.2M

Yang Chen Retweeted

S

Sam Altman@sama · Jul 22

we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project. some progress photos from abilene:

1.0K

2.0K

20.0K

2.0K

1.9M

Y

Yang Chen@ychenNLP · Jun 17

Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:…

YYang Chen@ychenNLP · Jun 17

The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up…

0

4

9

5

1.0K

Y

Yang Chen@ychenNLP · Jun 17

The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up…

FFrancesco Bertolotti@f14bertolotti · Jun 17

In this paper from Nvidia, the authors compare SFT and RL and how the first impacts the other. Among other findings, they notice that the stronger the SFT, the smaller the RL returns are. 🔗arxiv.org/pdf/2506.13284

1

4

54

35

5.0K

Yang Chen Retweeted

Y

Yang Chen@ychenNLP · Jun 17

📌Paper: arxiv.org/abs/2506.13284 📌Model: huggingface.co/nvidia/AceReas… 📌SFT Data: huggingface.co/datasets/nvidi… 📌Math RL Data: huggingface.co/datasets/nvidi… A series of our work on reasoning models: 📌5/22/2025: AceReason-Nemotron: Scaling RL for math and code (7B and 14B)…

0

3

12

5

738

Yang Chen Retweeted

Z

Zhuolin Yang@lucas110550 · Jun 6

I tried to reproduce DS-R1-distilled-7B and AceReason-7B's performance on your split (06/24-01/25), and they turn out to be 41.9 and 54.6 correspondingly, which is obviously higher than your reported number. Anything wrong here? @etash_guha @ryanmart3n

1

2

3

2

251

Yang Chen Retweeted

A

AK@_akhaliq · May 23

Nvidia just dropped AceReason-Nemotron on Hugging Face Advancing Math and Code Reasoning through Reinforcement Learning

4

34

199

73

20.0K

Yang Chen Retweeted

S

Sam Altman@sama · Mar 28

gpu shortage bro

456

242

9.0K

277

594.0K