Pranav Atreya
@pranav_atreya
Robot learning | CS Ph.D. student @berkeley_ai
In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from evals in broader ML. But generalist policies share a problem statement: do any task in any environment. Can generalist capabilities make robot evaluation easier?
We’re releasing the RoboArena today!🤖🦾 Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help! We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :) 🧵
At CoRL this September we'll be organizing a generalist robot policy development challenge, with policies evaluated and ranked by the RoboArena benchmark! This challenge will be embedded in the 1st workshop on generalist policies in the wild. sites.google.com/view/corl-robo…
Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓
Can we make robot policy evaluation easier and less time consuming? Introducing AutoEval, a system that *autonomously* evaluates generalist policies 24/7 and closely matches human results. We make 4 tasks 💫publicly available💫 Submit your policy at auto-eval.github.io! 🧵👇
Current robot learning methods are good at imitating tasks seen during training, but struggle to compose behaviors in new ways. When training imitation policies, we found something surprising—using temporally-aligned task representations enabled compositional generalization. 1/
Excited to release FAST, our new robot action tokenizer! 🤖 Some highlights: - Simple autoregressive VLAs match diffusion VLA performance - Trains up to 5x faster - Works on all robot datasets we tested - First VLAs that work out-of-the-box in new environments! 🧵/
In a future where robots are ubiquitously deployed, autonomous robot data will be a considerable data source. What would it take to tap into this data? I'll be in Poster Session 4 of @corl_conf this Friday to discuss our first step towards tackling this problem! Come stop by!
Robot learning requires data, but does the data need to be human collected? 🙅🏼 Can we bootstrap a self-improvement process with a pre-trained policy? 🤖🦾 Presenting SOAR: an approach to scalable autonomous improvement of a language-conditioned policy auto-improvement.github.io 🧵