Allen Z. Ren
@allenzren
Generalist robot policy @physical_int, PhD @Princeton
👇Introducing DPPO, Diffusion Policy Policy Optimization DPPO optimizes pre-trained Diffusion Policy using policy gradient from RL, showing 𝘀𝘂𝗿𝗽𝗿𝗶𝘀𝗶𝗻𝗴 𝗶𝗺𝗽𝗿𝗼𝘃𝗲𝗺𝗲𝗻𝘁𝘀 over a variety of baselines across benchmarks and sim2real transfer diffusion-ppo.github.io
We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵
Want robot imitation learning to generalize to new tasks? Blindfold your human demonstrator! Best robotics paper at EXAIT Workshop #ICML2025 openreview.net/forum?id=zqfT2… Wait, why does this make sense? Read below!
Action chunking + expressive action distribution —> Better exploration for RL! This was one of the biggest lessons we learned in DPPO as well
Action chunking works really well in imitation learning, and is essential to learning good BC policies in robotics. Can/should we apply the same idea in RL? We find that RL in the action chunk space, when done right (we call it ✨Q-chunking ✨), can be highly efficient🧵👇
Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-shot performance isn’t enough. To address this challenge, we introduce DSRL: Diffusion Steering via Reinforcement Learning. (1/n) diffusion-steering.github.io
Join us at two workshops #RSS2025 on 6/21! 📍 Resource Constrained Robotics (RTH109) 🗣️ Oral talk: 11:00–11:15 📍 Continual Robot Learning from Humans (OHE132) 🖼️ Spotlight poster: 10:30–11:00 Come by and chat—we’re excited to share our work!
Want your imitation learning policy to generalize better, but how to collect data to achieve this? 🤖🤔 Enter Factored Scaling Curves (FSC): a tool that quantifies how policy success scales with demos for each environmental factor, enabling principled data collection 📈 . 🌐…
🔎Can robots search for objects like humans? Humans explore unseen environments intelligently—using prior knowledge to actively seek information and guide search. But can robots do the same? 👀 🚀Introducing WoMAP (World Models for Active Perception): a novel framework for…
In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky motions at worst. But large VLAs are slow by nature. What can we do about this? An in-depth 🧵:
Our models need to run in real time on real robots, but inference with big VLAs takes a long time. We developed Real-Time Action Chunking (RTC) to enable real-time inference with flow matching for the π0 and π0.5 VLAs! More in the thread👇
Always fun to chat about generalization with Ani :)
Great to have @allenzren back at @Princeton for the PhD hooding ceremony, so I could ask him a whole bunch of questions about the pi_0.5 paper from @physical_int!
Our newest VLA training recipe achieves fast training, fast inference, and great performance, by carefully designing the interface between model backbone and continuous actions. Many lessons learned along the way👇
We figured out how to train VLAs with diffusion outputs much faster (7.5x faster), inheriting better language following from the VLM, and leading to better results. The key: protect the VLM backbone during training with knowledge insulation. Let’s talk about what we learned👇
How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: pi.website/research/knowl… Paper: pi.website/download/pi05_…
📢 Excited for the second workshop on Out-of-Distribution Generalization in Robotics: Towards Reliable Learning-based Autonomy at RSS! #RSS2025 🎯 How can we build reliable robotic autonomy for the real world? 📅 Short papers due 05/25/25 🌐 tinyurl.com/rss2025ood 🧵(1/4)
Nice to see the use of ManiSkill3 in this work! Simulation is not just useful for RL training. It provides some good cheap deterministic test beds, perfect for testing imitation learning scaling laws at scale. Years of data in hours
Want your imitation learning policy to generalize better, but how to collect data to achieve this? 🤖🤔 Enter Factored Scaling Curves (FSC): a tool that quantifies how policy success scales with demos for each environmental factor, enabling principled data collection 📈 . 🌐…
In the era of generalist robot foundation models, how do you get their pre-trained model to work well on your robot and task? 🌐factored-data-scaling.github.io 📈 We introduce Factored Scaling Curves (FSC): a principled approach for modeling how policy performance scales with data for…
Data is the fuel that drives robot learning, but we don't have great strategies for figuring out what data to collect to enable strong generalization. Check out @LihanZha's first paper as a PhD student at @Princeton! 𝐆𝐮𝐢𝐝𝐢𝐧𝐠 𝐃𝐚𝐭𝐚 𝐂𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧 𝐯𝐢𝐚…
Want your imitation learning policy to generalize better, but how to collect data to achieve this? 🤖🤔 Enter Factored Scaling Curves (FSC): a tool that quantifies how policy success scales with demos for each environmental factor, enabling principled data collection 📈 . 🌐…
Guided Data Collection via Factored Scaling Curves Provides a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves
Introducing ✨Latent Diffusion Planning✨ (LDP)! We explore how to use expert, suboptimal, & action-free data. To do so, we learn a diffusion-based *planner* that forecasts latent states, and an *inverse-dynamics model* that extracts actions. w/ @_oleh @DorsaSadigh @chelseabfinn
Check out our newest work in bringing robots closer to open-world generalization! It was truly amazing to see (1) data scaling and (2) iterating over the cross-embodiment co-training recipe solved the tasks that the robot struggled with when I first joined Pi.
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️