Xiao Ma
@yusufma555
Staff Research Scientist @ ByteDance Seed. All views my own.
🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable…
🏆 Huge congratulations to the #RSS2025 Award Winners! roboticsconference.org/program/awards/
Great work by @PYL78055244 ! BridgeVLA won the chaimpion of Colosseum Challenge in #CVPR2025 GRAIL Workshop by *bridging* 2D VLM features with 3D policies and achieves strong generalization across various settings. Code & paper available now! bridgevla.github.io/home_page.html
💥 Can we combine 2D VLA generalization with 3D policy efficiency? Introducing BridgeVLA – a 3D Visual Language Action model bridging pretrained VLM backbones and 3D VLAs. Reusing VLM weights isn’t enough – it needs smarter design. 🚀 Results: · 1st on RLBench, COLOSSEUM,…
🚀 Checkout Chain-of-Actions (CoA)! The core idea is to encourage agents to *think* in a reverse manner: decide the task-specific goals first, then *chain the actions* backwards to the starting pose, which just gives you a much stronger spatial generalization!
🚀 Excited to share our latest research on robotic manipulation Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation We rethink robotic manipulation as a goal-conditioned reasoning process. -Page: chain-of-action.github.io -Paper: arxiv.org/pdf/2506.09990
It’s been hard tuning VLMs into VLAs for robotic manipulation. Check out our latest work on the training recipes of VLAs! RoboVLMs allow you to build your own VLA model within just **30 lines of code**!🔥 Thanks to all my collaborators for this amazing work!
Thrilled to introduce RoboVLMs 🤖🌍, a unified, open-source and flexible VLA framework for easy integration of any VLMs for robotic tasks, within just 30 lines of code! 🚀 Through 600+ designed experiments, RoboVLMs supports 8 VLM backbones and 4 policy architectures. 📊
Extremely insightful work! Excited to see people looking into the issues of foundation models and take a step back to understand the issues of our current paradigms. Probably it's a good timing for us to rethink what could be the correct way of learning robotics world models.
Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises…
BiGym accepted to #CoRL2024! Big thanks to all my collaborators! Our paper and code are available. Check them out! Project page: chernyadev.github.io/bigym/ Simulator code: github.com/chernyadev/big… Baselines: github.com/robobase-org/r…
🚀 Looking for a benchmark for bi-manual mobile manipulation with nicely collected demonstrations? We are excited to release BiGym, a new benchmark with human-collected demos! 🌐 Website: chernyadev.github.io/bigym/ 📄 Paper: arxiv.org/abs/2407.07788 💻 Code: github.com/chernyadev/big…
Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4) younggyo.me/cqn
Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations? Introducing 𝗚𝗘𝗡𝗜𝗠𝗔 paper, videos, code, ckpts: genima-robot.github.io 🧵Thread⬇️