Matthew Yang

@matthewyryang

MSML student @ CMU

Joined August 2024

129Following

36Followers

Pinned

Matthew Yang@matthewyryang · Jun 12

🚨 NEW PAPER: What if LLMs could tackle harder problems - not by explicitly training on longer traces, but by learning how to think longer? Our recipe e3 teaches models to explore in-context, enabling LLMs to unlock longer reasoning chains without ever seeing them in training.…

718

Pinned

Matthew Yang Retweeted

Amrith Setlur@setlur_amrith · Jun 24

Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis

149

116

11.0K

Matthew Yang Retweeted

Alexandr Wang@alexandr_wang · Jul 1

I’m excited to be the Chief AI Officer of @Meta, working alongside @natfriedman, and thrilled to be accompanied by an incredible group of people joining on the same day. Towards superintelligence 🚀

1.0K

2.0K

22.0K

6.0K

3.4M

Matthew Yang Retweeted

Aviral Kumar@aviral_kumar2 · Jun 13

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. @setlur_amrith & @matthewyryang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️…

185

108

12.0K

Matthew Yang Retweeted

Amrith Setlur@setlur_amrith · Jun 13

Introducing e3 🔥 Best <2B model on math 💪 Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️ 🚨 arxiv.org/abs/2506.09026 🚨 matthewyryang.github.io/e3/

11.0K

Matthew Yang Retweeted

Quentin Gallouédec @ ICML@QGallouedec · Apr 8

🤔 How do you explain that when we apply RL to math problems, the incorrect answers become longer than the correct ones? We had this discussion this morning, and I'm curious to know what the community thinks about it.

186

19.0K

Matthew Yang Retweeted

Alex Patrascu@maxescu · Mar 30

Be water, my friend

274

2.0K

506

116.0K

Matthew Yang Retweeted

vitrupo@vitrupo · Mar 31

"We weren’t born to do jobs." Bill Gates says jobs are a relic of human scarcity. In a world without shortages, society will be able to produce enough—food, healthcare, services—without everyone working. The real shift won’t be economic. It’ll be reprogramming how we think…

257

368

2.0K

1.0K

413.0K

Matthew Yang Retweeted

Po-Shen Loh@PoShenLoh · Mar 15

Oh my goodness. GPT-o1 got a perfect score on my @CarnegieMellon undergraduate #math exam, taking less than a minute to solve each problem. I freshly design non-standard problems for all of my exams, and they are open-book, open-notes. (Problems included below, with links to…

365

3.0K

1.0K

580.0K

Matthew Yang Retweeted

Aviral Kumar@aviral_kumar2 · Mar 12

A lot of work focuses on test-time scaling. But we aren't scaling it optimally, simply training a long CoT doesn't mean we use it well. My students developed "v0" of a paradigm to do this optimally by running RL with dense rewards = minimizing regret over long CoT episodes. 🧵⬇️…

200

123

16.0K

Matthew Yang Retweeted

Amrith Setlur@setlur_amrith · Mar 12

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️

1.0K

Matthew Yang Retweeted

Yuxiao Qu@QuYuxiao · Mar 11

🚨 NEW PAPER: "Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning"! 🤔 With all these long-reasoning LLMs, what are we actually optimizing for? Length penalties? Token budgets? We needed a better way to think about it! Website: cohenqu.github.io/mrt.github.io/ 🧵[1/9]

308

223

43.0K

Matthew Yang Retweeted

Jiayi Pan@jiayi_pirate · Jan 24

We reproduced DeepSeek R1-Zero in the CountDown game, and it just works Through RL, the 3B base LM develops self-verification and search abilities all on its own You can experience the Ahah moment yourself for < $30 Code: github.com/Jiayi-Pan/Tiny… Here's what we learned 🧵

195

1.0K

6.0K

1.7M