Qiyang Li
@qiyang_li
Ph.D. Student @Berkeley_AI
Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N
I wrote a fun little article about all the ways to dodge the need for real-world robot data. I think it has a cute title. sergeylevine.substack.com/p/sporks-of-agi
Join us on July 19th at @icmlconf, Vancouver, for the EXAIT Workshop— a full-day workshop on the role of exploration in AI today.
Check out our poster on “Reinforcement Learning with Action Chunking” tomorrow 11:45-14:15 @ EXAIT Workshop (Meeting Room 205-207)!
Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N
Had so much fun working on this😊 PyTorch and JAX implementations are both out!
For everyone interested in precise 📷camera control 📷 in transformers [e.g., video / world model etc] Stop settling for Plücker raymaps -- use camera-aware relative PE in your attention layers, like RoPE (for LLMs) but for cameras! Paper & code: liruilong.cn/prope/
How can we train a foundation model to internalize what it means to “explore”? Come check out our work on “behavioral exploration” at ICML25 to find out!
Fine-tuning pre-trained robotic models with online RL requires a way to train RL with expressive policies Can we design an effective method for this? We propose EXPO, a sample-efficient online RL algorithm that enables stable fine-tuning of expressive policy classes (1/6)
Check out our poster 11-1:30 today @ West exhibition hall W-713!
Excited to be in Vancouver attending ICML this week to present some papers! Jul 16 Wed 11:30-1, W-713 1) Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration: w/ @wilcoxsonmax (co-lead), @kvfrans, @svlevine x.com/qiyang_li/stat…
Check out our poster on Wednesday 4:30p-7p (West Exhibition Hall, #713)!!
Flow Q-learning (FQL) is a simple method to train/fine-tune an expressive flow policy with RL. Come visit our poster at 4:30p-7p this Wed (evening session, 2nd day)!
I'm at ICML '25! Come check out our benchmark LMRL-Gym for multi-turn RL for LLMs at Wednesday's Poster Session. In addition to dialogue & text game tasks, we share a methodology for synthetic data generation to develop RL algorithms. Paper & code here: lmrl-gym.github.io
@qiyang_li will help present OTTER tomorrow at #ICML2025! A lightweight, instruction-following VLA! See OG post below! 👉Code already released at ottervla.github.io Poster will be presented at West Exhibition Hall B2-B3 #W-409 Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT
1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇
Action chunking + expressive action distribution —> Better exploration for RL! This was one of the biggest lessons we learned in DPPO as well
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
Action chunking is a great idea in robotics: by getting a model to produce a short sequence of actions, it _just works better_ for some mysterious reason. Now it turns out this can help in RL too, and it's a bit clearer why: action chunks help explore and help with backups. 🧵👇