Wenlong Huang

@wenlong_huang

PhD Student @StanfordSVL @StanfordAILab. Previously @Berkeley_AI @GoogleDeepMind. Robotics, Foundation Models.

Stanford, CA

Joined May 2019

1KFollowing

4KFollowers

Pinned

Wenlong Huang@wenlong_huang · Aug 29

What structural task representation enables multi-stage, in-the-wild, bimanual, reactive manipulation? Introducing ReKep: LVM to label keypoints & VLM to write keypoint-based constraints, solve w/ optimization for diverse tasks, w/o task-specific training or env models. 🧵👇

105

515

267

188.0K

Pinned

Wenlong Huang Retweeted

Shivansh Patel@shivanshpatel35 · Jul 1

🚀 Introducing RIGVid: Robots Imitating Generated Videos! Robots can now perform complex tasks—pouring, wiping, mixing—just by imitating generated videos, purely zero-shot! No teleop. No OpenX/DROID/Ego4D. No videos of human demonstrations. Only AI generated video demos 🧵👇

145

39.0K

Wenlong Huang Retweeted

Mihir Prabhudesai@mihirp98 · Jul 22

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

121

171

970

835

173.0K

Wenlong Huang@wenlong_huang · Jul 22

Excited that @RuohanZhang76 is joining NU @northwesterncs ! If you are thinking about pursuing a PhD, definitely reach out to him! During my wonderful year at @StanfordAILab @StanfordSVL, when I was completely new to robotics, he was the nicest person who was incredibly patient…

NNorthwestern University Computer Science@northwesterncs · Jul 21

📢 Beginning this fall, four new tenure-track, clinical, and visiting faculty members will join our department! 📢 We are thrilled to welcome Shaddin Dughmi, Sidhanth Mohanty, Lydia Tse, and Ruohan Zhang! Meet the newest members of our team: spr.ly/6019fGdXv

4.0K

Wenlong Huang Retweeted

Binghao Huang@binghao_huang · Jul 16

Tactile interaction in the wild can unlock fine-grained manipulation! 🌿🤖✋ We built a portable handheld tactile gripper that enables large-scale visuo-tactile data collection in real-world settings. By pretraining on this data, we bridge vision and touch—allowing robots to:…

280

123

45.0K

Wenlong Huang Retweeted

Russ Tedrake@RussTedrake · Jul 9

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…

105

476

189

73.0K

Wenlong Huang@wenlong_huang · Jul 1

Exciting to see more works leveraging VLM-inferred keypoints as a bridge between semantic knowledge and low-level behaviors, especially those dexterous skills 🤩

TTyler Lum@tylerlum23 · Jul 1

We find keypoint trajectories to be a powerful interface between VLM planning & RL control VLM: Generates object + hand motion plan from a task prompt & RGB-D image (perception + commonsense) RL policy: Conditioned on the plan, learns low-level dexterous control (0-shot sim2real)

1.0K

Wenlong Huang Retweeted

Manling Li@ManlingLi_ · Jun 30

Can VLMs build Spatial Mental Models like humans? Reasoning from limited views? Reasoning from partial observations? Reasoning about unseen objects behind furniture / beyond current view? Check out MindCube! 🌐mll-lab-nu.github.io/mind-cube/ 📰arxiv.org/pdf/2506.21458…

281

233

38.0K

Wenlong Huang Retweeted

Yu Xiang@YuXiang_IRVL · Jun 22

“As a PHD student, your job is not publishing a paper every quarter. Focus on a problem in deep understanding and solve it in years under the protect of your adviser” from @RussTedrake #RSS2025

915

207

161.0K

Wenlong Huang@wenlong_huang · Jun 23

Tesla Robotaxi: A New Era Begins I’ve (very fortunately) been part of multiple robotaxi launches. But this one is different and feels much more profound. It’s a paradigm shift. It’s the GPT moment for real-world autonomy. Tesla’s robotaxi runs vision-only -- no lidar, no radar,…

TTesla Owners Silicon Valley@teslaownersSV · Jun 22

The future of transportation is here with Tesla robotaxi

108

452

3.0K

289

413.0K

Wenlong Huang@wenlong_huang · Jun 21

Attending RSS for the first time and giving a talk tomorrow at the Learning Structured World Models for Robotic Manipulation workshop! At midnight, I made a last-minute crazy decision to change my talk content to Virtual Community — to honor the incredible hard work of my…

CChuang Gan@gan_chuang · Jun 20

World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or…

6.0K

Wenlong Huang@wenlong_huang · Jun 21

Join us tomorrow in SGM 124 for the SWOMO workshop at #RSS2025! We will have 6 amazing talks and a panel in the end to discuss structured world modeling for robotics! Latest schedule and information at swomo-rss.github.io

WWenlong Huang@wenlong_huang · May 1

Excited to announce the “Structured World Models for Robotic Manipulation” workshop at #RSS2025 in LA! Website: swomo-rss.github.io Call for Papers (Deadline: May 23): swomo-rss.github.io/index.html#call Come join us with a stellar lineup of speakers to discuss the various important &…

5.0K

Wenlong Huang Retweeted

Kaifeng Zhang@kaiwynd · Jun 19

Can we learn a 3D world model that predicts object dynamics directly from videos? Introducing Particle-Grid Neural Dynamics: a learning-based simulator for deformable objects that trains from real-world videos. Website: kywind.github.io/pgnd ArXiv: arxiv.org/abs/2506.15680…

165

42.0K

Wenlong Huang Retweeted

Yixuan Wang@YXWangBot · Jun 19

🤖 Does VLA models really listen to language instructions? Maybe not 👀 🚀 Introducing our RSS paper: CodeDiffuser -- using VLM-generated code to bridge the gap between **high-level language** and **low-level visuomotor policy** 🎮 Try the live demo: robopil.github.io/code-diffuser/ (1/9)

127

25.0K

Wenlong Huang Retweeted

Haoyu Xiong@Haoyu_Xiong_ · Jun 19

Your bimanual manipulators might need a Robot Neck 🤖🦒 Introducing Vision in Action: Learning Active Perception from Human Demonstrations ViA learns task-specific, active perceptual strategies—such as searching, tracking, and focusing—directly from human demos, enabling robust…

362

120

86.0K

Wenlong Huang Retweeted

Generalist@GeneralistAI_ · Jun 17

Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early…

147

853

264

257.0K

Wenlong Huang Retweeted

Manling Li@ManlingLi_ · Jun 11

Today is the day! Welcome to join @CVPR workshop on Foundation Models meet Embodied Agents! 🗓️Jun 11 📍Room 214 🌐…models-meet-embodied-agents.github.io/cvpr2025/ Looking forward to learning insights from wonderful speakers @JitendraMalikCV @RanjayKrishna @KaterinaFragiad @ShuangL13799063 @du_yilun…

13.0K

Wenlong Huang Retweeted

Sergey Levine@svlevine · Jun 8

I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/language-mod…

177

1.0K

295.0K

Wenlong Huang@wenlong_huang · Jun 2

Very impressed with Veo 3 and all the things people are finding on r/aivideo etc. Makes a big difference qualitatively when you add audio. There are a few macro aspects to video generation that may not be fully appreciated: 1. Video is the highest bandwidth input to brain. Not…

GGina Acosta@ginacostag_ · May 22

It's been only a day since Google dropped Veo 3. The new model creates video and audio simultaneously from a single prompt! Here are 13 wild examples so far: 1. Self-aware AI characters

313

732

6.0K

3.0K

996.0K

Wenlong Huang Retweeted

Danfei Xu@danfei_xu · Nov 10

Language-conditioned policy is kind of boring until we can have sensorimotor data that can reach (even a fraction) of the diversity. Before that, language is just one-hot task encoding.

165

16.0K

Wenlong Huang Retweeted

Yunzhu Li@YunzhuLiYZ · May 21

Two days into #ICRA2025 @ieee_ras_icra—great connecting with folks! Gave a talk, moderated a panel, and got a *Best Paper Award* 🏆 at the workshops. Up next: four papers and two more workshop talks/panels. Excited to chat robot learning and the road to general intelligence! 🤖

149

9.0K