Paul Zhou (@zhiyuan_zhou_)

Pinned

P

Paul Zhou@zhiyuan_zhou_ · Jul 3

We tested WSRL (Warm-start RL) on a Franka Robot, and it leads to really efficient online RL fine-tuning in the real world! WSRL learned the peg insertion task perfectly with only 11 minutes of warmup and *7 minutes* of online RL interactions 👇🧵

9

37

254

128

80.0K

P

Paul Zhou@zhiyuan_zhou_ · Jul 19

Check out our poster on “Reinforcement Learning with Action Chunking” tomorrow 11:45-14:15 @ EXAIT Workshop (Meeting Room 205-207)!

QQiyang Li@qiyang_li · Jul 12

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

0

4

46

21

4.0K

P

Paul Zhou@zhiyuan_zhou_ · Jul 12

It makes sense that temporally extended actions would help quite a bit with exploration and credit assignment right?

PPaul Zhou@zhiyuan_zhou_ · Jul 12

Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇

4

26

10

5.0K

P

Paul Zhou@zhiyuan_zhou_ · Jul 12

Action chunking + expressive action distribution —> Better exploration for RL! This was one of the biggest lessons we learned in DPPO as well

PPaul Zhou@zhiyuan_zhou_ · Jul 12

Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇

2

15

144

83

14.0K

P

Paul Zhou@zhiyuan_zhou_ · Jul 12

In our recent work using offline RL for active perception, action chunking significantly boosts model performance. It's exciting to see work thoroughly analyze its mechanism, looking for more application around it

PPaul Zhou@zhiyuan_zhou_ · Jul 12

Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇

0

2

12

5

2.0K

Paul Zhou Retweeted

S

Sergey Levine@svlevine · Jul 11

Action chunking is a great idea in robotics: by getting a model to produce a short sequence of actions, it _just works better_ for some mysterious reason. Now it turns out this can help in RL too, and it's a bit clearer why: action chunks help explore and help with backups. 🧵👇

9

106

689

534

55.0K