Yixin Ye
@BLeavesYe
Undergrad @sjtu1896. Intern @ GAIR Lab (http://plms.ai) Visiting @stanfordnlp. NLP/LLMs/Reasoning. Looking for a Ph.D. in the 26 fall.
🤔 How many examples does an LLM need to learn competition-level math? Conventional wisdom: 100,000+ examples Our discovery: Just 817 carefully chosen ones 🤩 With pure SFT, LIMO achieves: 57.1% on AIME 94.8% on MATH LIMO: Less is More for Reasoning 📝 🔗 arxiv.org/pdf/2502.03387

🎉 Excited to announce that LIMO has been accepted by COLM2025 @COLM_conf ! We'll be releasing an updated paper soon with detailed data construction processes and a new version of dataset - smaller in size but with better performance. Stay tuned!
🤔 How many examples does an LLM need to learn competition-level math? Conventional wisdom: 100,000+ examples Our discovery: Just 817 carefully chosen ones 🤩 With pure SFT, LIMO achieves: 57.1% on AIME 94.8% on MATH LIMO: Less is More for Reasoning 📝 🔗 arxiv.org/pdf/2502.03387
What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…
Third-party evaluations highlight LIMO's strong generalization capabilities on AIME 2025!!!
I spent some time evaluating the frontier math models on AIME24 and AIME25 to see how they "Generalize". An interesting trend I found is that SFT on minimum data can also generalize quite well if you pick the right data. See LIMO-32B. Training with RL does not necessarily lead…
Thrilled to see LIMO dataset making such an immediate impact! A 10-point boost on AIME24 and GPQA, with 3-point improvement on MATH-500 is truly exciting. Welcome more researchers to explore and experiment. Together we can push the boundaries of efficient mathematical reasoning…
I'm running a shit-ton of GRPO experiments on DeepSeek's distilled models with the LIMO dataset and it really works well 🔥! Depending on the hyperparameters, I'm able to get ~10 point boost on AIME24 and GPQA, with ~3 point boost on MATH-500 (likely saturated). Link with more…