Xiao Ma

@yusufma555

Staff Research Scientist @ ByteDance Seed. All views my own.

Singapore

Joined August 2015

492Following

571Followers

Pinned

Xiao Ma@yusufma555 · Jul 22

🚀🚀🚀 Ever wondered what it takes for robots to handle real-world household tasks? long-horizon execution, deformable object dexterity, and unseen object generalization — meet GR-3, ByteDance Seed’s new Vision-Language-Action (VLA) model! GR-3 is a generalizable…

492

291

40.0K

Xiao Ma@yusufma555 · Jul 22

Thanks @_akhaliq for sharing our work!

AAK@_akhaliq · Jul 22

GR-3 Technical Report

142

23.0K

Xiao Ma Retweeted

Robotics: Science and Systems@RoboticsSciSys · Jun 24

🏆 Huge congratulations to the #RSS2025 Award Winners! roboticsconference.org/program/awards/

6.0K

Xiao Ma@yusufma555 · Jun 17

Great work by @PYL78055244 ! BridgeVLA won the chaimpion of Colosseum Challenge in #CVPR2025 GRAIL Workshop by *bridging* 2D VLM features with 3D policies and achieves strong generalization across various settings. Code & paper available now! bridgevla.github.io/home_page.html

PPeiyan Li@PYL78055244 · Jun 16

💥 Can we combine 2D VLA generalization with 3D policy efficiency? Introducing BridgeVLA – a 3D Visual Language Action model bridging pretrained VLM backbones and 3D VLAs. Reusing VLM weights isn’t enough – it needs smarter design. 🚀 Results: · 1st on RLBench, COLOSSEUM,…

493

Xiao Ma@yusufma555 · Jun 12

🚀 Checkout Chain-of-Actions (CoA)! The core idea is to encourage agents to *think* in a reverse manner: decide the task-specific goals first, then *chain the actions* backwards to the starting pose, which just gives you a much stronger spatial generalization!

WWenbo Zhang@_wenbozhang · Jun 12

🚀 Excited to share our latest research on robotic manipulation Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation We rethink robotic manipulation as a goal-conditioned reasoning process. -Page: chain-of-action.github.io -Paper: arxiv.org/pdf/2506.09990

533

Xiao Ma@yusufma555 · Dec 19

It’s been hard tuning VLMs into VLAs for robotic manipulation. Check out our latest work on the training recipes of VLAs! RoboVLMs allow you to build your own VLA model within just **30 lines of code**!🔥 Thanks to all my collaborators for this amazing work!

MMinghuan Liu@ericliuof97 · Dec 19

Thrilled to introduce RoboVLMs 🤖🌍, a unified, open-source and flexible VLA framework for easy integration of any VLMs for robotic tasks, within just 30 lines of code! 🚀 Through 600+ designed experiments, RoboVLMs supports 8 VLM backbones and 4 policy architectures. 📊

985

Xiao Ma@yusufma555 · Nov 5

Extremely insightful work! Excited to see people looking into the issues of foundation models and take a step back to understand the issues of our current paradigms. Probably it's a good timing for us to rethink what could be the correct way of learning robotics world models.

BBingyi Kang@bingyikang · Nov 5

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises…

714

Xiao Ma@yusufma555 · Sep 5

BiGym accepted to #CoRL2024! Big thanks to all my collaborators! Our paper and code are available. Check them out! Project page: chernyadev.github.io/bigym/ Simulator code: github.com/chernyadev/big… Baselines: github.com/robobase-org/r…

NNikita Chernyadev@nc__dev · Jul 11, 2024

🚀 Looking for a benchmark for bi-manual mobile manipulation with nicely collected demonstrations? We are excited to release BiGym, a new benchmark with human-collected demos! 🌐 Website: chernyadev.github.io/bigym/ 📄 Paper: arxiv.org/abs/2407.07788 💻 Code: github.com/chernyadev/big…

5.0K

Xiao Ma Retweeted

Younggyo Seo@younggyoseo · Jul 11, 2024

Introducing CQN: Coarse-to-fine Q-Network, a value-based RL algorithm for continuous control🦾Initialized with 20~50 demonstrations, it learns to solve real-world robotic tasks within 10 mins of training, without any pre-training and shaped rewards! (1/4) younggyo.me/cqn

109

16.0K

Xiao Ma Retweeted

Mohit Shridhar@mohito1905 · Jul 11, 2024

Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations? Introducing 𝗚𝗘𝗡𝗜𝗠𝗔 paper, videos, code, ckpts: genima-robot.github.io 🧵Thread⬇️

160

36.0K