Wei Ping (@_weiping)

Pinned

W

Wei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

_weiping's tweet image. Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL)

We propose conducting RL on math-only prompts first, then on code-only prompts.
Our key findings include:
- Math-only RL significantly boosts both math and code benchmarks!
-…

2

24

150

82

52.0K

Pinned

Wei Ping Retweeted

R

Rafael Valle@RafaelValleArt · Jul 14

🤯 Audio Flamingo 3 is out already... and that's before Audio Flamingo 2 makes its debut at ICML on Wednesday, July 16 at 4:30 p.m.! These benchmark results are insane! arxiv.org/abs/2507.08128

1

16

53

11

4.0K

Pinned

W

Wei Ping@_weiping · Jun 17

Our released evaluation toolkit can reproduce our AceReason-Nemotron models numbers (see below): AceReason-Nemotron-1.0-7B: LiveCodeBench (Avg@8): * [05/23-05/24]: 72.0; [06/24-01/25]: 54.2 * release set v5: 51.2; release set v6: 44.4 AIME (Avg@64): * AIME'24: 68.6; AIME'25:…

YYang Chen@ychenNLP · Jun 17

The first thing we did was to make sure the eval setup is correct! We spend a lot of time to make sure our eval can - accurately reproduce the DeepSeek-R1 numbers on AIME, LiveCodeBench - it's IMPOSSIBLE to track the RL progress without a good eval set up (e.g., we see AIME up…

0

4

9

5

1.0K

Pinned

Wei Ping Retweeted

Y

Yang Chen@ychenNLP · Jun 17

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench…

6

44

205

159

17.0K

Pinned

W

Wei Ping@_weiping · Jun 17

With stronger SFT backbone, AceReason-Nemotron-1.1-7B significantly outperforms its predecessor and sets a record-high performance among Qwen2.5-7B-based reasoning models. 📄Report: arxiv.org/pdf/2506.13284 🤗Model: huggingface.co/nvidia/AceReas… 📚SFT Data: huggingface.co/datasets/nvidi…

WWei Ping@_weiping · Jun 17

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…

1

8

25

9

2.0K

Pinned

W

Wei Ping@_weiping · Jun 17

Checkout our detailed study on advancing math and code reasoning using SFT and RL.

WWei Ping@_weiping · Jun 17

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…

1

3

12

2

801

Pinned

W

Wei Ping@_weiping · Jun 17

Introducing AceReason-Nemotron 1.1 Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate…

_weiping's tweet image. Introducing AceReason-Nemotron 1.1

Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness.
Here, we systematically investigate…

1

16

68

32

6.0K

Pinned

Wei Ping Retweeted

G

Gavin Newsom@GavinNewsom · Jun 12

If they can handcuff a U.S. Senator for asking a question, imagine what they will do to you.

55.0K

63.0K

345.0K

12.0K

35.6M

Pinned

Wei Ping Retweeted

M

Max Zhaoshuo Li 李赵硕@mli0603 · Jun 11

Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)

2

32

99

29

12.0K

Pinned

W

Wei Ping@_weiping · Jun 6

New reasoning Nemotron-H models are now publicly available. These models are based on hybrid architecture! 47B and 8B in BF16 and FP8. Blogpost: developer.nvidia.com/blog/nemotron-… Weights: huggingface.co/collections/nv…

AAdi Renduchintala@rendu_a · Jun 6

Transformers are still dominating the LLM scene but we show that higher throughput alternatives exist which are just as strong! Grateful to have a part in Nemotron-H Reasoning effort. 🙏 Technical report will be out soon, stay tuned!

1

25

122

26

23.0K

Pinned

W

Wei Ping@_weiping · Jun 5

Pass@1024 results of our RL model (AceReason-Nemotron-7B) and its starting SFT model (DeepSeek-R1-Distill-Qwen-7B) on LiveCodeBench-v6, which features a large answer space and high-quality test cases that are difficult to solve through 'guessing', even with extensive sampling.…

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

2

9

54

21

6.0K

Pinned

W

Wei Ping@_weiping · May 29

👍👍

DDeepSeek@deepseek_ai · May 29

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗…

0

1

0

938

Pinned

Wei Ping Retweeted

A

AK@_akhaliq · May 23

Nvidia just dropped AceReason-Nemotron on Hugging Face Advancing Math and Code Reasoning through Reinforcement Learning

4

34

199

73

20.0K

Pinned

W

Wei Ping@_weiping · May 23

Check out our AceReason-Nemotron-14B. 🤗huggingface.co/nvidia/AceReas… We start with RL training using math-only prompts, then continue with code-only prompts, which further enhances coding performance while maintaining math capability.

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

0

2

13

4

1.0K

Pinned

W

Wei Ping@_weiping · May 23

Lots of good analysis in here!

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

0

3

12

2

2.0K

Pinned

W

Wei Ping@_weiping · May 23

with just math-RL, AceReason-Nemotron-14B surpass DeepCoder-14B on LiveCodeBench v5. we then did code-RL and found training becomes so much easier

WWei Ping@_weiping · May 23

Introducing AceReason-Nemotron: Advancing math and code reasoning through reinforcement learning (RL) We propose conducting RL on math-only prompts first, then on code-only prompts. Our key findings include: - Math-only RL significantly boosts both math and code benchmarks! -…

0

9

49

19

5.0K

Pinned

Wei Ping Retweeted

N

NVIDIA AI Developer@NVIDIAAIDev · May 23

📣 Introducing AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning (RL) Starting from the SFT model DeepSeek-R1-Distill-Qwen-14B, our AceReason-Nemotron-14B achieves substantial improvements in pass@1 accuracy on key benchmarks through RL: AIME…

7

36

141

31

9.0K

Pinned

Wei Ping Retweeted

O

Oleksii Kuchaiev@kuchaev · May 5

Llama-Nemotron-v1 technical report is now available on arxiv arxiv.org/pdf/2505.00949…

3

65

348

165

28.0K

Pinned

Wei Ping Retweeted

Q

Qwen@Alibaba_Qwen · Apr 28

Introducing Qwen3! We release and open-weight Qwen3, our latest large language models, including 2 MoE models and 6 dense models, ranging from 0.6B to 235B. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general…

355

2.0K

8.0K

2.0K

2.2M

Pinned

W

Wei Ping@_weiping · Apr 24

Introducing AceMath-RL-Nemotron-7B, a math reasoning model trained entirely through reinforcement learning from DeepSeek-R1-Distilled-Qwen-7B. It achieves AIME24: 69.0%, AIME25: 53.6%, and GPQA: 52.1%. Interestingly, this math-focused RL training also improves the coding…

WWei Ping@_weiping · Apr 24

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after…

0

4

11

2

1.0K