Dylan
@dylan_works_
PhD student. Working on LLM Post-Training
Sharing our recent study - "The Best Instruction-Tuning Data are Those That Fit". We introduce a practical, lightweight approach that selects in-distribution data for supervised fine-tuning—yielding better models with less data & compute w. surprising simplicity and efficiency!
👀interesting! Thanks for sharing the insights Dr.Diao😬
Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…
👀👀
You 𝗗𝗢𝗡'𝗧 need so much data to train a Search Agent. Just 2.4k random samples — that's all it takes for our s3. Coming Soon. #LLM #RAG #SearchAgent #EMNLP2025 #NeurIPS2025 #ACL2025NLP #ACL2025 #NAACL2025 #AgenticRAG #AgenticAI #AgenticSearch #SearchAgent
Awesome job! @AndrewZ45732491 @YiranWu18
❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/