Haoyu Xiong
@Haoyu_Xiong_
Incoming PhD student @MIT EECS
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…
Just reread the tidybot2.github.io docs today, what an incredible tutorial for building a robot system. Honestly, you could set up an entire new robot lab just by following it, @jimmyyhwu even gave you the link of the screwdriver he used 😂
Extrapolating this trend to robotics, i believe if one is doing sim2real they should prefer Autoregressive > Diffusion (compute bottleneck). But if they are doing real world training then Autoregressive < Diffusion (data bottleneck).. We don't empirically validate this for…
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are…
Intelligence is on tap now so agency is even more important
Check out @binghao_huang ‘s great work on scaling up tactile interaction in the wild!
Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to:…
Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to:…
At a robotics lab in Pittsburgh, engineers are building adaptable, AI-powered robots that could one day work where it's too dangerous for humans. The research drew a visit from President Trump, who touted U.S. dominance in AI as companies announced $90 billion in new investments.
We’re building robots that work. Ultra's intelligent warehouse robots deploy in hours (not weeks), and adapt to real world chaos. Our robots are already packaging e-commerce orders in 3PL warehouses across the US. We create value for customers on day 1 using teleop control, and…
Compression is the heart of intelligence From Occam to Kolmogorov—shorter programs=smarter representations Meet KARL: Kolmogorov-Approximating Representation Learning. Given an image, token budget T & target quality 𝜖 —KARL finds the smallest t≤T to reconstruct it within 𝜖🧵
This is wild…
we’re putting ai glasses on chinese factory workers to replace them with robots
Robot learning has largely focused on standard platforms—but can it embrace robots of all shapes and sizes? In @XiaomengXu11's latest blog post, we show how data-driven methods bring unconventional robots to life, enabling capabilities that traditional designs and control can't…
I have new favourite blogsite
It is insane how underrated these blogs are Man made an interative visualization for different kinds of attention mech (He has interactive visualizations for RNNs, LSTMs, CNNs, and so much more)
Now in Nature! 🚀 Our method learns a controllable 3D model of any robot from vision, enabling single-camera closed-loop control at test time! This includes robots previously uncontrollable, soft, and bio-inspired, potentially lowering the barrier of entry to automation! Paper:…
Checkout @mangahomanga new work on One-shot human imitation
Presenting DemoDiffusion: An extremely simple approach enabling a pre-trained 'generalist' diffusion policy to follow a human-demonstration for a novel task during inference One-shot human imitation *without* requiring any paired human-robot data or online RL 🙂 1/n
Witnessed the process of @Haoyu_Xiong_ building up the entire system from scratch. Amazing to see the outcomes! Robots operating in clustered environments with many occlusions are still unaddressed. Your robot really needs a neck for that, and it can be as many as 6 DoFs 🐍
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…
Haoyu built an awesome bimanual + neck robot which can be easily mounted on the TidyBot++ mobile base. Hardware design is fully open source! Check out his thread below to learn more 👇
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…
We’ve open-sourced everything: vision-in-action.github.io Arxiv: arxiv.org/abs/2506.15666 Github: github.com/haoyu-x/vision… Hardware: github.com/haoyu-x/vision… Thanks to my incredible collaborators @XiaomengXu11 @jimmyyhwu @YifanHou2. Thanks to Jeannette @leto__jean for her…
Teleoperating a robot feels unnatural — not just because of limited arm or hand DoFs, but also because of the lack of perceptual freedom! Humans naturally move their head and torso to search, track, and focus — far beyond a simple 2-DoF camera. How to get there? Check out…
Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…
Assemble a minimal humanoid using off-the-shelf arms and just a few frame components!
Haoyu built an awesome bimanual + neck robot which can be easily mounted on the TidyBot++ mobile base. Hardware design is fully open source! Check out his thread below to learn more 👇
ViA shows robust visual understanding. In the Lime & Pot task, the lime is randomly placed and often not visible at first. The robot learns to look around and search for the object first before initiating arm actions. Check the rollouts👇 5/7