Yang Chen
@ychenNLP
Research Scientist @NVIDIA | PhD @GeorgiaTech| RL and LLM reasoning
📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench…

With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: arxiv.org/pdf/2506.13284 🤗Model: huggingface.co/nvidia/AceReas… 📚SFT Data: huggingface.co/datasets/nvidi…
Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…
Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…
Does RL incentive reasoning capability over the starting SFT model? We show an interesting result with our recent published AceReason-Nemotron-7B model, which was trained with RL pass@K from 1 to 1024 consistently +10% on LiveCodeBench v6 perhaps scaling RL is the key
Pass@1024 results of our RL model (AceReason-Nemotron-7B) and its starting SFT model (DeepSeek-R1-Distill-Qwen-7B) on LiveCodeBench-v6, which features a large answer space and high-quality test cases that are difficult to solve through 'guessing', even with extensive sampling.…
with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier
Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…
Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…
Introducing AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning from DeepSeek-R1-Distilled-Qwen-7B. It achieves AIME24: 69.0%, AIME25: 53.6%, and GPQA: 52.1%. Interestingly, this math-focused RL training also improves the coding…
Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after…
Had a lot of fun to scale up RL to improve math reasoning! Excited to introduce AceMath-RL-Nemotron-7B with a scalable training recipe 📑Full blog: research.nvidia.com/labs/adlr/acem… 🔗Model: huggingface.co/nvidia/AceMath…
Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after…
we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project. some progress photos from abilene:
Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:…
The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up…
The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up…
In this paper from Nvidia, the authors compare SFT and RL and how the first impacts the other. Among other findings, they notice that the stronger the SFT, the smaller the RL returns are. 🔗arxiv.org/pdf/2506.13284
📌Paper: arxiv.org/abs/2506.13284 📌Model: huggingface.co/nvidia/AceReas… 📌SFT Data: huggingface.co/datasets/nvidi… 📌Math RL Data: huggingface.co/datasets/nvidi… A series of our work on reasoning models: 📌5/22/2025: AceReason-Nemotron: Scaling RL for math and code (7B and 14B)…
I tried to reproduce DS-R1-distilled-7B and AceReason-7B's performance on your split (06/24-01/25), and they turn out to be 41.9 and 54.6 correspondingly, which is obviously higher than your reported number. Anything wrong here? @etash_guha @ryanmart3n
Nvidia just dropped AceReason-Nemotron on Hugging Face Advancing Math and Code Reasoning through Reinforcement Learning