Richard Pang
@yzpang_
http://yzpang.me; @AIatMeta Llama research, prev: NYU, Meta FAIR, @uchicago, @googleai; research: llm, text gen, alignment, reasoning, human-lm collab, etc.
I’ll be at #ICML2025 this week to present ScPO: 📌 Wednesday, July 16th, 11:00 AM-1:30 PM 📍East Exhibition Hall A-B, E-2404 Stop by or reach out to chat about improving reasoning in LLMs, self-training, or just tips about being on the job market next cycle! 😃
🚨 Self-Consistency Preference Optimization (ScPO)🚨 - New self-training method without human labels - learn to make the model more consistent! - Works well for reasoning tasks where RMs fail to evaluate correctness. - Close to performance of supervised methods *without* labels,…
We worked on a whole line of research on this: - Self-Rewarding LMs (use self as a Judge in semi-online DPO): arxiv.org/abs/2401.10020 - Thinking LLMs (learn CoTs with a Judge with semi-online DPO): arxiv.org/abs/2410.10630 *poster at ICML this week!!* - Mix verifiable &…
Still surprised this doesn’t start reward hacking
🎉 Excited to share that my internship work, ScPO, on self-training LLMs to improve reasoning without human labels, has been accepted to #ICML2025! Many thanks to my awesome collaborators at @AIatMeta and @uncnlp🌞Looking forward to presenting ScPO in Vancouver 🇨🇦
🚨 Self-Consistency Preference Optimization (ScPO)🚨 - New self-training method without human labels - learn to make the model more consistent! - Works well for reasoning tasks where RMs fail to evaluate correctness. - Close to performance of supervised methods *without* labels,…
We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were…
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model…
First set of Llama 4!!
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4…
Note: most MIT professors I know are honest and morally upright
New work where we show that with the right training distribution, transformers can learn to search and internally implement an exponential path-merging algo. But they struggle to learn to search as the graph size increases, and simple solns like scaling doesn't resolve it.
🚨🔔Foundational graph search task as testbed: for some distribution, transformers can learn to search (100% acc). We interpreted their algo!! But as graph size ↑, transformers struggle. Scaling up # params does not help; CoT does not help. 1.5 years of learning in 10 pages!
Search is a core operation for reasoning in large language models. Check out our new work where we dive deep into the ability of the transformer models to learn to search.
🚨🔔Foundational graph search task as testbed: for some distribution, transformers can learn to search (100% acc). We interpreted their algo!! But as graph size ↑, transformers struggle. Scaling up # params does not help; CoT does not help. 1.5 years of learning in 10 pages!
Transformers Struggle to Learn to Search Finds that transformer-based LLMs struggle to perform search robustly. Suggests that given the right training distribution, the transformer can learn to search. Also reports that performing search in-context exploration (i.e.,…
Transformers Struggle to Learn to Search Demonstrates transformers can be taught to perform graph search tasks but increasingly struggle on larger graphs, with this difficulty not resolved by increased model scale. 📝arxiv.org/abs/2412.04703
I’ll be at NeurIPS this week! Presenting at the Thursday 4:30pm poster session and giving a spotlight talk at the AIDrugX workshop on Sunday. Also, I’ve finally joined 🦋. Come find me, both at NeurIPS and on 🦋! ☺️