Yuxiao Qu
@QuYuxiao
PhD @mldcmu, advised by @aviral_kumar2 and @rsalakhu Interests: Reasoning & RL & FMs Prev: @UWMadison, @UW, @CUHKofficial
If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.
Heading to @icmlconf #ICML2025 this week! DM me if you’d like to chat ☕️ Come by our poster sessions on: 🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning (arxiv.org/abs/2503.07572) 🔍 Learning to Discover Abstractions for LLM Reasoning (drive.google.com/file/d/1Sfafrk…)

✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
I am excited to give an oral talk on our work about “Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning” at #ICLR2025 FM-Wild Workshop! 🚀 📍Hall 4 #6 🕚11:30AM, April 27th 🖥️Can’t be there in person, but chat with @ianwu97 who’ll present our poster after the talk!

There’s a lot of awesome research about LLM reasoning right now. But how is learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when…
Scaling test-time compute is fine 😒 but are we making good use of it? 🤔 We try to answer this question in our new work: arxiv.org/pdf/2503.07572 TLDR; 🚀 *Optimizing* test-time compute = RL with dense (progress) rewards = minimizing regret over long CoT episodes 😲 🧵⤵️
Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4
🚨🚨 Preprint Alert 🚨🚨 🚀🚀 As AI become agents 🤖, how can we reliably delegate tasks to them, if they cannot communicate their limitations😭 or ask for help or test-time compute 🧑🚒 when needed? We present our new pre-print **Self-Regulation and Requesting Interventions**…
blog.ml.cmu.edu/2025/01/08/opt… How can we train LLMs to solve complex challenges beyond just data scaling? In a new blogpost, @setlur_amrith, @QuYuxiao Matthew Yang, @LunjunZhang , @gingsmith and @aviral_kumar2 demonstrate that Meta RL can help LLMs better optimize test time compute
At #NeurIPS2024 main conf, we will present several works on understanding offline RL methods, RL for LLM reasoning, agents, etc. led by my students and collaborators. Come talk to us to learn more and discuss future directions + what we are excited about! More details in 🧵⬇️
I’ll be at #NeurIPS2024 next week to present our work on 📎Recursive Introspection: Teaching Language Model Agents How to Self-Improve 📌Poster Session 3 East #2805 🗓️Dec 12, 11:00-2:00 This is joint work with amazing collaborators @tianjun_zhang, Naman, @aviral_kumar2
