Yixuan Wang (@YXWangBot)

Pinned

Y

Yixuan Wang@YXWangBot · Jan 24

🤔Active robot exploration is critical but hard – long-horizon, large space, and complex occlusions. How can robot explore like human? 🤖Introducing CuriousBot, which interactively explores and builds actionable 3D relational object graph. 🔗curiousbot.theaiinstitute.com 👇Threads(1/9)

2

8

39

9

7.0K

Pinned

Yixuan Wang Retweeted

S

Shivansh Patel@shivanshpatel35 · Jul 1

🚀 Introducing RIGVid: Robots Imitating Generated Videos! Robots can now perform complex tasks—pouring, wiping, mixing—just by imitating generated videos, purely zero-shot! No teleop. No OpenX/DROID/Ego4D. No videos of human demonstrations. Only AI generated video demos 🧵👇

3

32

145

77

39.0K

Y

Yixuan Wang@YXWangBot · Jul 17

I was really impressed by the UMI gripper (@chichengcc et al.), but a key limitation is that **force-related data wasn’t captured**: humans feel haptic feedback through the mechanical springs, but the robot couldn’t leverage that info, limiting the data’s value for fine-grained…

BBinghao Huang@binghao_huang · Jul 16

Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to:…

1

23

141

41

11.0K

Y

Yixuan Wang@YXWangBot · Jul 16

It is soooooo awesome to see UMI + Tactile comes to life! I am very impressed how quickly the whole hardware + software system is built. Meanwhile, they even collected lots of the data in the wild! Very amazing work!!!

BBinghao Huang@binghao_huang · Jul 16

Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to:…

1

5

38

6

5.0K

Yixuan Wang Retweeted

R

Russ Tedrake@RussTedrake · Jul 9

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…

8

105

476

189

73.0K

Yixuan Wang Retweeted

Y

Yunzhu Li@YunzhuLiYZ · Jun 22

Had a great time yesterday giving three invited talks at #RSS2025 workshops—on foundation models, structured world models, and tactile sensing for robotic manipulation. Lots of engaging conversations! One more talk coming up on Wednesday (6/25). Also excited to be presenting two…

2

12

75

13

4.0K

Y

Yixuan Wang@YXWangBot · Jun 21

Just arrived at LA and excited to be at RSS! I will present CodeDiffuser at following sessions: - Presentation on June 22 (Sun.) 5:30 PM - 6:30 PM - Poster on June 22 (Sun.) 6:30 PM - 8:00 PM I will also present CuriousBot at - FM4RoboPlan Workshop on June 21 (Sat.) 9:40 - 10:10…

YYixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

0

1

13

0

1.0K

Y

Yixuan Wang@YXWangBot · Jun 20

We’ve been exploring 3D world models with the goal of finding the right recipe that is both: (1) structured—for sample efficiency and generalization (my personal emphasis), and (2) scalable—as we increase real-world data collection. With **Particle-Grid Neural Dynamics**…

KKaifeng Zhang@kaiwynd · Jun 19

Can we learn a 3D world model that predicts object dynamics directly from videos? Introducing Particle-Grid Neural Dynamics: a learning-based simulator for deformable objects that trains from real-world videos. Website: kywind.github.io/pgnd ArXiv: arxiv.org/abs/2506.15680…

0

12

84

36

6.0K

Y

Yixuan Wang@YXWangBot · Jun 19

How can we achieve both common sense understanding that can deal with varying levels of ambiguity in language and dextrous manipulation? Check out CodeDiffuser, a really neat work that bridges Code Gen with a 3D Diffusion Policy! This was a fun project with cool experiments! 🤖

YYixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

1

7

13

4

2.0K

Y

Yixuan Wang@YXWangBot · Jun 19

Check out the cool results and demo!

YYixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

0

1

4

0

709

Y

Yixuan Wang@YXWangBot · Jun 19

Two releases in a row from our lab today 😆 One problem I was always pondering on is how to use structured representation while making it scalable. Super excited that Kaifeng's work pushes this direction forward and I cannot wait to see what's more in the future!!

KKaifeng Zhang@kaiwynd · Jun 19

Can we learn a 3D world model that predicts object dynamics directly from videos? Introducing Particle-Grid Neural Dynamics: a learning-based simulator for deformable objects that trains from real-world videos. Website: kywind.github.io/pgnd ArXiv: arxiv.org/abs/2506.15680…

0

9

1

965

Y

Yixuan Wang@YXWangBot · Jun 19

**Steerability** remains one of the key issues for current vision-language-action models (VLAs). Natural language is often ambiguous and vague: "Hang a mug on a branch" vs "Hang the left mug on the right branch." Many works claim to handle language input, yet the tasks are often…

YYixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

0

24

133

64

12.0K

Y

Yixuan Wang@YXWangBot · Jun 19

It is cool to see that you can steer your low-level policy with foundation models. Check out new work from @YXWangBot

YYixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

0

2

4

0

1.0K

Yixuan Wang Retweeted

R

RoboPapers@RoboPapers · May 21

Ep#10 with @RogerQiu_42 on Humanoid Policy ~ Human Policy human-as-robot.github.io Co-hosted by @chris_j_paxton & @micoolcho

0

6

17

15

13.0K