Fangchen Liu
@fangchenliu_
Ph.D. @Berkeley_AI, prev @PKU1898 @HaoSuLabUCSD
@qiyang_li will present OTTER tomorrow at #ICML2025! A lightweight, instruction-following VLA! See OG post below! 👉Code already released at ottervla.github.io Poster will be presented at West Exhibition Hall B2-B3 #W-409 Tue 15 Jul 11 a.m. PDT — 1:30 p.m. PDT
1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇
Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N
Join us to explore the frontier of humanoid agents at CVPR👇
Join us tomorrow for the 1st Workshop on Humanoid Agents! We have an exciting lineup: @xiaolonw @xavierpuigf @GuanyaShi @GerardPonsMoll1 @blacksquirrel__ @tianminshu @petitegeek @xbpeng4 📍 Room 101 D, Music City Center 🔗 humanoid-agents.github.io @CVPR #CVPR2025
✈️to #CVPR2025 to give three workshop/tutorial talks about learning humanoid whole-body control and loco-manipulation: - Wed 8:30am @ 3D Scene Understanding, 106C - Wed 10am @ Humanoid Agent, 101D - Thu 11am @ Robotics 101 tutorial, 202B Excited to meet old & new friends!
Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵
Ppl are collecting large-scale teleoperation datasets, which are often just kinematics-level trajectories. Real2Render2Real is a new framework that can generate these data w.o. teleoperation or tricky sim+rl. High data quality for BC + nice scaling effect, plz dive in for more!
Tired of teleoperating your robots? We built a way to scale robot datasets without teleop, dynamic simulation, or even robot hardware. Just one smartphone scan + one human hand demo video → thousands of diverse robot trajectories. Trainable by diffusion policy and VLA models…
My goal throughout my PhD has been to take robots out of the lab and into the real world. It was so special to be a part of this effort and see this dream become reality! Excited to keep pushing model capabilities—and, of course, keep playing with robots 🤖
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:
Join us at the 1st Workshop on Humanoid Agents @CVPR! #CVPR2025 Speakers in CV, CG, Robotics & CogSci will share insights on building virtual & physical human-like AI agents. 💃🤖🦾 📢 Submit your work & spark interdisciplinary discussions! 🔗 Details: humanoid-agents.github.io
New VLA work from @fangchenliu_ @RavenHuang4 @letian_fu and its all open source! Cool insights on how to better leverage pretrained vision and languge models for robotics. Code in both jax and torch!
1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇
We had all the ingredients years ago—CLIP has been around since 2021! OTTER shows that combining these existing tools in the right way unlocks powerful robotic control capabilities. Lightweight (~30M-params policy!), real-time, and fully open-sourced @ ottervla.github.io
1/N Most Vision-Language-Action models need tons of data for finetuning, and still fail for new objects and instructions. Introducing OTTER, a lightweight, easy-to-train model that uses text-aware visual features to nail unseen tasks out of the box! Here's how it works 👇