Richard Pang

@yzpang_

http://yzpang.me; @AIatMeta Llama research, prev: NYU, Meta FAIR, @uchicago, @googleai; research: llm, text gen, alignment, reasoning, human-lm collab, etc.

New York, NY

Joined October 2017

401Following

492Followers

Richard Pang@yzpang_ · Jul 15

I’ll be at #ICML2025 this week to present ScPO: 📌 Wednesday, July 16th, 11:00 AM-1:30 PM 📍East Exhibition Hall A-B, E-2404 Stop by or reach out to chat about improving reasoning in LLMs, self-training, or just tips about being on the job market next cycle! 😃

JJason Weston@jaseweston · Nov 7

🚨 Self-Consistency Preference Optimization (ScPO)🚨 - New self-training method without human labels - learn to make the model more consistent! - Works well for reasoning tasks where RMs fail to evaluate correctness. - Close to performance of supervised methods *without* labels,…

8.0K

Richard Pang@yzpang_ · Jul 14

We worked on a whole line of research on this: - Self-Rewarding LMs (use self as a Judge in semi-online DPO): arxiv.org/abs/2401.10020 - Thinking LLMs (learn CoTs with a Judge with semi-online DPO): arxiv.org/abs/2410.10630 *poster at ICML this week!!* - Mix verifiable &…

GGrad@Grad62304977 · Jul 12

Still surprised this doesn’t start reward hacking

198

184

58.0K

Richard Pang@yzpang_ · May 2

🎉 Excited to share that my internship work, ScPO, on self-training LLMs to improve reasoning without human labels, has been accepted to #ICML2025! Many thanks to my awesome collaborators at @AIatMeta and @uncnlp🌞Looking forward to presenting ScPO in Vancouver 🇨🇦

JJason Weston@jaseweston · Nov 7

242

22.0K

Richard Pang Retweeted

Ahmad Al-Dahle@Ahmad_Al_Dahle · Apr 7

We're glad to start getting Llama 4 in all your hands. We're already hearing lots of great results people are getting with these models. That said, we're also hearing some reports of mixed quality across different services. Since we dropped the models as soon as they were…

1.0K

149

344.0K

Richard Pang Retweeted

Laurens van der Maaten@lvdmaaten · Apr 5

Can't wait!

269

Richard Pang Retweeted

AI at Meta@AIatMeta · Apr 5

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model…

827

2.0K

13.0K

3.0K

3.4M

Richard Pang@yzpang_ · Apr 5

First set of Llama 4!!

AAhmad Al-Dahle@Ahmad_Al_Dahle · Apr 5

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4…

1.0K

Richard Pang Retweeted

Kevin K. Yang 楊凱筌@KevinKaichuang · Dec 14

Note: most MIT professors I know are honest and morally upright

1.0K

99.0K

Richard Pang@yzpang_ · Dec 10

New work where we show that with the right training distribution, transformers can learn to search and internally implement an exponential path-merging algo. But they struggle to learn to search as the graph size increases, and simple solns like scaling doesn't resolve it.

RRichard Pang@yzpang_ · Dec 9

🚨🔔Foundational graph search task as testbed: for some distribution, transformers can learn to search (100% acc). We interpreted their algo!! But as graph size ↑, transformers struggle. Scaling up # params does not help; CoT does not help. 1.5 years of learning in 10 pages!

5.0K

Richard Pang@yzpang_ · Dec 9

Search is a core operation for reasoning in large language models. Check out our new work where we dive deep into the ability of the transformer models to learn to search.

RRichard Pang@yzpang_ · Dec 9

912

Richard Pang Retweeted

elvis@omarsar0 · Dec 9

Transformers Struggle to Learn to Search Finds that transformer-based LLMs struggle to perform search robustly. Suggests that given the right training distribution, the transformer can learn to search. Also reports that performing search in-context exploration (i.e.,…

266

176

19.0K

Richard Pang Retweeted

Sumit@_reachsumit · Dec 9

Transformers Struggle to Learn to Search Demonstrates transformers can be taught to perform graph search tasks but increasingly struggle on larger graphs, with this difficulty not resolved by increased model scale. 📝arxiv.org/abs/2412.04703

189

140

19.0K

Richard Pang Retweeted

Angelica Chen@_angie_chen · Dec 9

I’ll be at NeurIPS this week! Presenting at the Thursday 4:30pm poster session and giving a spotlight talk at the AIDrugX workshop on Sunday. Also, I’ve finally joined 🦋. Come find me, both at NeurIPS and on 🦋! ☺️

114

11.0K