Dylan

@dylan_works_

PhD student. Working on LLM Post-Training

United States

Joined October 2021

3KFollowing

558Followers

Pinned

Dylan@dylan_works_ · Feb 21

Sharing our recent study - "The Best Instruction-Tuning Data are Those That Fit". We introduce a practical, lightweight approach that selects in-distribution data for supervised fine-tuning—yielding better models with less data & compute w. surprising simplicity and efficiency!

12.0K

Dylan@dylan_works_ · Jun 2

👀interesting! Thanks for sharing the insights Dr.Diao😬

SShizhe Diao@shizhediao · Jun 2

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering…

1.0K

Dylan@dylan_works_ · May 17

👀👀

PPatrick Jiang@patpcj · May 16

You 𝗗𝗢𝗡'𝗧 need so much data to train a Search Agent. Just 2.4k random samples — that's all it takes for our s3. Coming Soon. #LLM #RAG #SearchAgent #EMNLP2025 #NeurIPS2025 #ACL2025NLP #ACL2025 #NAACL2025 #AgenticRAG #AgenticAI #AgenticSearch #SearchAgent

626

Dylan@dylan_works_ · May 7

Awesome job! @AndrewZ45732491 @YiranWu18

❄❄️Andrew Zhao❄️@_AndrewZhao · May 7

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

606