Yining Hong
@yining_hong
💻Postdoc in CS AI @stanford | 🤖embodied 3D foundation models | 3D-LLMs | embodied world models | Musician -🎸Multi-Instrumentalist & Composer | Metalhead 🤘🏼
3D-LLM has reached 200 citations within one year of its acceptance🎉
3D-LLM: Injecting the 3D World into Large Language Models paper page: huggingface.co/papers/2307.12… Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not…
Thank you AK! All data, codes and web environments are available at embodied-web-agent.github.io Paper link: arxiv.org/abs/2506.15677 Huggingface link: huggingface.co/papers/2506.15…
Embodied Web Agents Bridging Physical-Digital Realms for Integrated Agent Intelligence
Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…
#CVPR2025 @CVPR Hope y’all enjoy the house band performance last night! See you next year in Denver!🤘🏼



3DLLM-Mem won the best paper of Foundation Models Meet Embodied Agents Workshop! Congrats to our first author @gordonhu608
🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? 🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it…
So excited to announce 3DLLM-Mem! 3DLLM-Mem can maintain long-term spatial-temporal memory in large 3D scenes using a memory fusion mechanism incorporated into 3D-LLMs. Come check out!
🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? 🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it…
[ICLR2025 Spotlight] SlowFast-VGen utilizes test-time training to keep episodic memory of generated long videos, mimicking the fast learning of hippocampus, thereby maintaining consistency for long video generation. See you tmr!
🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…
Excited to host the 1st Workshop on 3D-LLM/VLA at #CVPR2025! @CVPR This workshop explores integrating LLMs and VLA models with 3D perception to enhance foundation models for embodied agents and robot control. Paper Deadline: April 20, 2025 Website: 3d-llm-vla.github.io

SlowFast-VGen has been accepted to ICLR as a spotlight paper with scores of 8-8-8-6 🎉
🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…