Yuxiao Qu

@QuYuxiao

PhD @mldcmu, advised by @aviral_kumar2 and @rsalakhu Interests: Reasoning & RL & FMs Prev: @UWMadison, @UW, @CUHKofficial

Pittsburgh, PA

Joined November 2020

107Following

307Followers

Yuxiao Qu Retweeted

Aviral Kumar@aviral_kumar2 · Jul 15

If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.

150

6.0K

Yuxiao Qu@QuYuxiao · Jul 14

Heading to @icmlconf #ICML2025 this week! DM me if you’d like to chat ☕️ Come by our poster sessions on: 🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572) 🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

QuYuxiao's tweet image. Heading to @icmlconf #ICML2025 this week! DM me if you’d like to chat ☕️

Come by our poster sessions on:
🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572)
🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

4.0K

Yuxiao Qu Retweeted

Yutong (Kelly) He@electronickale · Apr 28

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵

20.0K

Yuxiao Qu@QuYuxiao · Apr 25

I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀 📍Hall 4 #6 🕚11:30AM, April 27th 🖥️Can’t be there in person, but chat with @ianwu97 who’ll present our poster after the talk!

QuYuxiao's tweet image. I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀
📍Hall 4 #6
🕚11:30AM, April 27th
🖥️Can’t be there in person, but chat with @ianwu97 who’ll present our poster after the talk!

1.0K

Yuxiao Qu Retweeted

Max Simchowitz@max_simchowitz · Apr 16

There’s a lot of awesome research about LLM reasoning right now. But how is learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when…

213

137

29.0K

Yuxiao Qu Retweeted

Amrith Setlur@setlur_amrith · Mar 12

Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️

1.0K

Yuxiao Qu Retweeted

Isaac Liao@LiaoIsaac91893 · Mar 4

Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4

199

1.0K

924

222.0K

Yuxiao Qu Retweeted

So Yeon (Tiffany) Min@SoYeonTiffMin · Feb 10

🚨🚨 Preprint Alert 🚨🚨 🚀🚀 As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑‍🚒 when needed? We present our new pre-print **Self-Regulation and Requesting Interventions**…

108

17.0K

Yuxiao Qu Retweeted

ML@CMU@mlcmublog · Jan 9

blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, @setlur_amrith, @QuYuxiao Matthew Yang, @LunjunZhang , @gingsmith and @aviral_kumar2 demonstrate that Meta RL can help LLMs better optimize test time compute

18.0K

Yuxiao Qu Retweeted

Aviral Kumar@aviral_kumar2 · Dec 8

At #NeurIPS2024 main conf, we will present several works on understanding offline RL methods, RL for LLM reasoning, agents, etc. led by my students and collaborators. Come talk to us to learn more and discuss future directions + what we are excited about! More details in 🧵⬇️

7.0K

Yuxiao Qu@QuYuxiao · Dec 6

I’ll be at #NeurIPS2024 next week to present our work on 📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve 📌Poster Session 3 East #2805 🗓️Dec 12, 11:00-2:00 This is joint work with amazing collaborators @tianjun_zhang, Naman, @aviral_kumar2

QuYuxiao's tweet image. I’ll be at #NeurIPS2024 next week to present our work on

📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve

📌Poster Session 3 East #2805
🗓️Dec 12, 11:00-2:00

This is joint work with amazing collaborators @tianjun_zhang, Naman, @aviral_kumar2

396