Amrith Setlur
@setlur_amrith
Phd Student at CMU.
Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis
Join us on July 19th at @icmlconf, Vancouver, for the EXAIT Workshop— a full-day workshop on the role of exploration in AI today.
Attend our Scaling Self-improvement workshop @iclr_conf (Garnet 214-215) for some amazing talks and a fiery panel discussion (5-6pm)🔥
With a stellar lineup of speakers and panelists, including Yoshua Bengio 🙀, the Scaling Self-Improving Foundation Models at @iclr_conf promises to be 🔥 ⏰ Sunday, April 27 📍 Garnet 214-215
I couldn't be there @iclr_conf but if you are interested in process verifiers that can boost exploration and get LLMs to solve hard problems, check out our spotlight poster on PAVs at 3pm Hall 3+2B #548. Also chat with the amazing @ianwu97 who will be presenting on our behalf!

It's easy to (pre-)train LLMs by imitating discrete actions (next tokens). Surprisingly, imitating *continuous* actions (eg in robots 🤖) is "exponentially" hard for *any* algorithm🤯 that only uses expert data, even when the expert is deterministic🙀! Check out this cool work:
There’s a lot of awesome research about LLM reasoning right now. But how is learning in the physical world 🤖different than in language 📚? In a new paper, show that imitation learning in continuous spaces can be exponentially harder than for discrete state spaces, even when…