Sergey Levine
@svlevine
Associate Professor at UC Berkeley Co-founder, Physical Intelligence
We’re organizing the RoboArena Challenge at CoRL this year! Show the performance of your best generalist policy, in a fair, open benchmark for the robotics community! 🤖 Sign up, even if you don’t have a robot! More details in 🧵👇
We're nearing the GPT moment for robotics. But to get there, we need models that combine broad generalization with high task success. That’s why I’m covering @physical_int – the company building a foundation model for robotic intelligence – in this edition of Startups to Join.
How can we train a foundation model to internalize what it means to “explore”? Come check out our work on “behavioral exploration” at ICML25 to find out!
Generalist policies are getting better, but they are far from perfect. In new domains, fast exploration and adaptation is essential (even humans need this!) How can agents explore *meaningfully*? Can it explore without gradient updates? Check out Behavior Exploration 🧵👇
How can we train a foundation model to internalize what it means to “explore”? Come check out our work on “behavioral exploration” at ICML25 to find out!
Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!
Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: arxiv.org/abs/2502.02538 Project page: seohong.me/projects/fql/ Thread ↓
I'm at ICML '25! Come check out our benchmark LMRL-Gym for multi-turn RL for LLMs at Wednesday's Poster Session. In addition to dialogue & text game tasks, we share a methodology for synthetic data generation to develop RL algorithms. Paper & code here: lmrl-gym.github.io
Jul 16 Wed 4:30-7, W-713 (same day same spot :D) 2) Flow Q-Learning: w/ @seohong_park (lead), @svlevine x.com/seohong_park/s…
Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: arxiv.org/abs/2502.02538 Project page: seohong.me/projects/fql/ Thread ↓
Excited to be in Vancouver attending ICML this week to present some papers! Jul 16 Wed 11:30-1, W-713 1) Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration: w/ @wilcoxsonmax (co-lead), @kvfrans, @svlevine x.com/qiyang_li/stat…
We just made a major update to our work on leveraging prior trajectory data with *no* reward label for online RL exploration. Our method (SUPE) shows strong performance across 42 long-horizon, sparse reward tasks. How does it work? 🧵1/N
Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N