Paul Zhou
@zhiyuan_zhou_
phd @berkeley_ai, research intern @physical_int
We tested WSRL (Warm-start RL) on a Franka Robot, and it leads to really efficient online RL fine-tuning in the real world! WSRL learned the peg insertion task perfectly with only 11 minutes of warmup and *7 minutes* of online RL interactions 👇🧵
Check out our poster on “Reinforcement Learning with Action Chunking” tomorrow 11:45-14:15 @ EXAIT Workshop (Meeting Room 205-207)!
Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N
It makes sense that temporally extended actions would help quite a bit with exploration and credit assignment right?
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
Action chunking + expressive action distribution —> Better exploration for RL! This was one of the biggest lessons we learned in DPPO as well
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
In our recent work using offline RL for active perception, action chunking significantly boosts model performance. It's exciting to see work thoroughly analyze its mechanism, looking for more application around it
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
Action chunking is a great idea in robotics: by getting a model to produce a short sequence of actions, it _just works better_ for some mysterious reason. Now it turns out this can help in RL too, and it's a bit clearer why: action chunks help explore and help with backups. 🧵👇