Xuxin Cheng
@xuxin_cheng
Robot Learning; Embodied AI; PhD @UCSanDiego MS @CarnegieMellon Prev @UCBerkeley
Meet 𝐀𝐌𝐎 — our universal whole‑body controller that unleashes the 𝐟𝐮𝐥𝐥 kinematic workspace of humanoid robots to the physical world. AMO is a single policy trained with RL + Hybrid Mocap & Trajectory‑Opt. Accepted to #RSS2025. Try our open models & more 👉…
How to generate billion-scale manipulation demonstrations easily? Let us leverage generative models! 🤖✨ We introduce Dex1B, a framework that generates 1 BILLION diverse dexterous hand demonstrations for both grasping 🖐️and articulation 💻 tasks using a simple C-VAE model.
Imagine robots learning new skills—without any robot data. Today, we're excited to release EgoZero: our first steps in training robot policies that operate in unseen environments, solely from data collected through humans wearing Aria smart glasses. 🧵👇
How can we leverage diverse human videos to improve robot manipulation? Excited to introduce EgoVLA — a Vision-Language-Action model trained on egocentric human videos by explicitly modeling wrist & hand motion. We build a shared action space between humans and robots, enabling…
Want to learn more about the technical details behind recent humanoid dancing controllers? Check out this blog from @breadli428.
🧠With the shift in humanoid control from pure RL to learning from demonstrations, we take a step back to unpack the landscape. 🔗breadli428.github.io/post/lfd/ 🚀Excited to share our blog post on Feature-based vs. GAN-based Learning from Demonstrations—when to use which, and why it…
AMO live demo at #RSS2025 ! 👉amo-humanoid.github.io
Meet 𝐀𝐌𝐎 — our universal whole‑body controller that unleashes the 𝐟𝐮𝐥𝐥 kinematic workspace of humanoid robots to the physical world. AMO is a single policy trained with RL + Hybrid Mocap & Trajectory‑Opt. Accepted to #RSS2025. Try our open models & more 👉…
🚀 Meet ACE-F — a next-gen teleop system merging human and robot precision. Foldable, portable, cross-platform — it enables 6-DoF haptic control for force-aware manipulation. 🦾 See our demo & talk at the Robot Hardware-Aware Intelligence workshop this Wed @RoboticsSciSys!
Would like to acknowledge @xuxin_cheng and @EpisodeYang for their open-sourced projects—our codebase builds on top of openTeleVision and Vuer.
Teleoperation still has a lot to explore. In this work, point cloud + 6DoF head enables searching and operating in cluttered environments. By streaming point cloud and rendering on device, the user feels less latency and more immersive. Congrats on the amazing work!
Teleoperation still has a lot to explore. In this work, point cloud + 6DoF head enables searching and operating in cluttered environments. By streaming point cloud and rendering on device, the user feels less latency and more immersive. Congrats on the amazing work!
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…
This work is not about a new technique. GMT (General Motion Tracking) shows good engineering practices that you can actually train a single unified whole-body control policy for all agile motion, and it works in the real world, directly with sim2real without adaptation. This is…
Coordinating diverse, high-speed motions with a single control policy has been a long-standing challenge. Meet GMT—our universal tracker that keeps up with a whole spectrum of agile movements, all with one single policy.
🚀Introducing GMT — a general motion tracking framework that enables high-fidelity motion tracking on humanoid robots by training a single policy from large, unstructured human motion datasets. 🤖A step toward general humanoid controllers. Project Website:…
Despite great advances in learning dexterity, hardware remains a major bottleneck. Most dexterous hands are either bulky, weak or expensive. I’m thrilled to present the RUKA Hand — a powerful, accessible research tool for dexterous manipulation that overcomes these limitations!
Test-Time Training (TTT) is now on Video! And not just a 5-second video. We can generate a full 1-min video! TTT module is an RNN module that provides an explicit and efficient memory mechanism. It models the hidden state of an RNN with a machine learning model, which is updated…
Collect human data at scale using a VR app, then when you want to perform a task, you can predict human joints and retarget to humanoids. Really cool work
Diverse training data leads to a more robust humanoid manipulation policy, but collecting robot demonstrations is slow. Introducing our latest work, Humanoid Policy ~ Human Policy. We advocate human data as a scalable data source for co-training egocentric manipulation policy.⬇️
Teleoperation is so tedious. Can we find a better way to scale real-world data? Answer: Human videos. Inspiration: When we were doing teleoperation, we observed the motion performed by the human is almost the same as the robot, it is really just a 3D transformation away. Besides…
Diverse training data leads to a more robust humanoid manipulation policy, but collecting robot demonstrations is slow. Introducing our latest work, Humanoid Policy ~ Human Policy. We advocate human data as a scalable data source for co-training egocentric manipulation policy.⬇️