Yilun Du

@du_yilun

Assistant Professor at Harvard @KempnerInst + CS. PhD @MIT_CSAIL, BS MIT. Generative Models, Compositionality, Embodied Agents, Robot Learning.

Harvard

Joined September 2012

291Following

10KFollowers

Pinned

Yilun Du@du_yilun · Jul 7

Excited to share Energy-Based Transformers (EBTs), which allows you to implement system 2 thinking in any modality! EBTs formulate reasoning as an energy optimization problem, allowing models to internally think without complexities like CoT or multiple recurrent latents.

AAlexi Gladstone@AlexiGlad · Jul 7

How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…

136

960

544

76.0K

Yilun Du@du_yilun · Jul 22

Check out @YuncongYY post on test-time scaling for spatial reasoning with world models!

YYuncong Yang@YuncongYY · Jul 21

Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/

5.0K

Yilun Du@du_yilun · Jul 18

VLMs often struggle with physical reasoning tasks such as spatial reasoning. Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!

AAK@_akhaliq · Jul 18

MindJourney Test-Time Scaling with World Models for Spatial Reasoning

186

24.0K

Yilun Du@du_yilun · Jul 17

Come check out recent work on history guided video diffusion tomorrow!

KKiwhan Song@kiwhansong0 · Jul 17

Come visit our #ICML2025 poster on Diffusion Forcing Transformer tomorrow! Stop by to chat about sequence/video diffusion, or anything related to generative and world models. I’ll be presenting with @du_yilun on Thursday, 4:30–7pm at West Hall B2-B3 (#W-205).

7.0K

Yilun Du@du_yilun · Jul 14

I'll be at @icmlconf! Will help present: - Scene Understanding with Generative Models (shorturl.at/JrvJL) - History-guided World Models (shorturl.at/lCkfc) - Adaptable World Models (shorturl.at/99Xmw) We'll also host a workshop on physical world models!

173

20.0K

Yilun Du@du_yilun · Jul 9

Awesome paper on robot foundation models with super rigorous evaluation. Definitely a must-read!

RRuss Tedrake@RussTedrake · Jul 9

TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…

6.0K

Yilun Du Retweeted

Hanlin Zhang@_hanlin_zhang_ · Jul 2

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning…

225

12.0K

Yilun Du@du_yilun · Jul 3

Excited to share our workshop on Robotics World Modeling @corl_conf 2025! World models have a huge range of applications in robotics from offline simulation, to planning, to reinforcement learning. Consider submitting your work in the area to the workshop!

SSean Kirmani@SeanKirmani · Jul 3

🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025! We have an excellent group of speakers and panelists, and are inviting you to submit your papers with a July 13 deadline. Website: robot-world-modeling.github.io

7.0K

Yilun Du Retweeted

Yutong Bai@YutongBAI1002 · Jun 27

What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…

123

503

325

150.0K

Yilun Du@du_yilun · Jun 17

Super cool results towards general-purpose robots!

GGeneralist@GeneralistAI_ · Jun 17

Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early…

3.0K

Yilun Du@du_yilun · Jun 14

Excited to share our recent work on how we can flexible combine visual generative models, VLMs and simulators for visual synthesis! This enables physics engine controlled video generation, graphics engine controlled image generation, and compositional image synthesis!

YYunzhi Zhang@zhang_yunzhi · Jun 14

(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.

183

17.0K

Yilun Du Retweeted

Manling Li@ManlingLi_ · Jun 11

Today is the day! Welcome to join @CVPR workshop on Foundation Models meet Embodied Agents! 🗓️Jun 11 📍Room 214 🌐…models-meet-embodied-agents.github.io/cvpr2025/ Looking forward to learning insights from wonderful speakers @JitendraMalikCV @RanjayKrishna @KaterinaFragiad @ShuangL13799063 @du_yilun…

13.0K

Yilun Du Retweeted

Kempner Institute at Harvard University@KempnerInst · Jun 9

NEW: @du_yilun of @GoogleDeepMind & incoming #KempnerInstitute faculty explains how optimizing energy functions can help solve challenging navigation & reasoning problems. Watch the talk: youtube.com/watch?v=UKbLBO… #NeuroAI2025 #ML #neuroscience #NeuroAI

6.0K