Yilun Du
@du_yilun
Assistant Professor at Harvard @KempnerInst + CS. PhD @MIT_CSAIL, BS MIT. Generative Models, Compositionality, Embodied Agents, Robot Learning.
Excited to share Energy-Based Transformers (EBTs), which allows you to implement system 2 thinking in any modality! EBTs formulate reasoning as an energy optimization problem, allowing models to internally think without complexities like CoT or multiple recurrent latents.
How can we unlock generalized reasoning? ⚡️Introducing Energy-Based Transformers (EBTs), an approach that out-scales (feed-forward) transformers and unlocks generalized reasoning/thinking on any modality/problem without rewards. TLDR: - EBTs are the first model to outscale the…
Check out @YuncongYY post on test-time scaling for spatial reasoning with world models!
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
VLMs often struggle with physical reasoning tasks such as spatial reasoning. Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!
MindJourney Test-Time Scaling with World Models for Spatial Reasoning
Come check out recent work on history guided video diffusion tomorrow!
Come visit our #ICML2025 poster on Diffusion Forcing Transformer tomorrow! Stop by to chat about sequence/video diffusion, or anything related to generative and world models. I’ll be presenting with @du_yilun on Thursday, 4:30–7pm at West Hall B2-B3 (#W-205).
I'll be at @icmlconf! Will help present: - Scene Understanding with Generative Models (shorturl.at/JrvJL) - History-guided World Models (shorturl.at/lCkfc) - Adaptable World Models (shorturl.at/99Xmw) We'll also host a workshop on physical world models!
Awesome paper on robot foundation models with super rigorous evaluation. Definitely a must-read!
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/ One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning…
Excited to share our workshop on Robotics World Modeling @corl_conf 2025! World models have a huge range of applications in robotics from offline simulation, to planning, to reinforcement learning. Consider submitting your work in the area to the workshop!
🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025! We have an excellent group of speakers and panelists, and are inviting you to submit your papers with a July 13 deadline. Website: robot-world-modeling.github.io
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…
Super cool results towards general-purpose robots!
Today we're excited to share a glimpse of what we're building at Generalist. As a first step towards our mission of making general-purpose robots a reality, we're pushing the frontiers of what end-to-end AI models can achieve in the real world. Here's a preview of our early…
Excited to share our recent work on how we can flexible combine visual generative models, VLMs and simulators for visual synthesis! This enables physics engine controlled video generation, graphics engine controlled image generation, and compositional image synthesis!
(1/n) Time to unify your favorite visual generative models, VLMs, and simulators for controllable visual generation—Introducing a Product of Experts (PoE) framework for inference-time knowledge composition from heterogeneous models.
Today is the day! Welcome to join @CVPR workshop on Foundation Models meet Embodied Agents! 🗓️Jun 11 📍Room 214 🌐…models-meet-embodied-agents.github.io/cvpr2025/ Looking forward to learning insights from wonderful speakers @JitendraMalikCV @RanjayKrishna @KaterinaFragiad @ShuangL13799063 @du_yilun…
NEW: @du_yilun of @GoogleDeepMind & incoming #KempnerInstitute faculty explains how optimizing energy functions can help solve challenging navigation & reasoning problems. Watch the talk: youtube.com/watch?v=UKbLBO… #NeuroAI2025 #ML #neuroscience #NeuroAI