Wenbo Hu

@gordonhu608

CS PhD Student @UCLA | Multimodal & Embodied AI & Spatial Intelligence B.S @UCSanDiego

Los Angeles

Joined April 2023

557Following

406Followers

Pinned

Wenbo Hu@gordonhu608 · May 29

🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks? 🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it…

gordonhu608's tweet image. 🤔How to maintain a long-term memory for a 3D embodied AI agent across dynamic spatial-temporal environment changes in complex tasks?

🚀Introducing 3DLLM-Mem, a memory-enhanced 3D embodied agent that incrementally builds and maintains a task-relevant long-term memory while it…

32.0K

Wenbo Hu@gordonhu608 · Jun 19

Please Check Embodied Agent! One Agent that brings digital web knowledge into physical real world actions.

YYining Hong@yining_hong · Jun 19

Meet Embodied Web Agents that bridge physical-digital realms. Imagine embodied agents that can search for online recipes, shop for ingredients and cook for you. Embodied web agents search internet information for implementing real-world embodied tasks. All data, codes and web…

377

Wenbo Hu@gordonhu608 · Jun 12

So grateful for the Best Paper award. Congratulations to the whole team!

YYining Hong@yining_hong · Jun 11

3DLLM-Mem won the best paper of Foundation Models Meet Embodied Agents Workshop! Congrats to our first author @gordonhu608

690

Wenbo Hu@gordonhu608 · Jun 10

This work will give an Oral Presentation #CVPR2025 at the Foundation Models Meet Embodied Agents workshop (Wed 10am, 6/11). Please join to hear @yining_hong presents our work.

WWenbo Hu@gordonhu608 · May 29

378

Wenbo Hu Retweeted

Johnson Lin@zy27962986 · May 20

Introducing 😶‍🌫️DreamGen, the pioneering approach to neural trajectories + robotics at NVIDIA GEAR lab. We’re among the first to show how large-scale synthetic data can significantly improve a robot’s ability to generalize to new actions and environments. If you’re interested,…

2.0K

Wenbo Hu@gordonhu608 · Apr 22

Excited to be at #ICLR2025 🇸🇬 between 4/24 and 4/28 and sharing this work on Multimodal RAG. Presenting this work on 4/26 Saturday 3pm - 5:30pm at Hall 3 + Hall 2B #108. I'm also happy to chat about multimodal models, 3D vision-language, and embodied AI in general with old…

WWenbo Hu@gordonhu608 · Oct 11

🚀Introducing MRAG-Bench: How do Large Vision-Language Models utilize vision-centric multimodal knowledge? 🤔Previous multimodal knowledge QA benchmarks can mainly be solved by retrieving text knowledge.💥We focus on scenarios where retrieving knowledge from image corpus is more…

2.0K

Wenbo Hu Retweeted

uclanlp@uclanlp · Apr 10

📣 For this week’s NLP Seminar, we are thrilled to host Zhe Gan @zhegan4 to give a talk titled “How to Build Your Multimodal LLMs: From Pre-training to Post-training and Agents”! 🗓️ 4/11 Fri 2pm PT Registration: forms.gle/TNXfBZJiMJjL18…

2.0K

Wenbo Hu Retweeted

Yihe Deng@Yihe__Deng · Mar 21

🚀Excited to share our latest work: OpenVLThinker, an exploration into enhancing vision-language models with R1 reasoning capabilities. By iterative integration of SFT and RL, we enabled LVLMs to exhibit robust R1 reasoning behavior. As a result, OpenVLThinker achieves a 70.2%…

171

21.0K

Wenbo Hu@gordonhu608 · Jan 22

Excited to share MRAG-Bench is accepted at #ICLR2025 🇸🇬. The image corpus is a rich source of information, and extracting knowledge from it can often be more advantageous than from a text corpus. We study how MLLMs can utilize vision-centric multimodal knowledge. More in our…

WWenbo Hu@gordonhu608 · Oct 11

3.0K

Wenbo Hu@gordonhu608 · Dec 13

Had an incredible experience at #NeurIPS2024 ! It was fantastic to connect with so many people interested in our work and to gain valuable insights and inspiration for the future of multimodal research. I’m deeply grateful for the opportunity to present our work with my amazing…

gordonhu608's tweet image. Had an incredible experience at #NeurIPS2024 ! It was fantastic to connect with so many people interested in our work and to gain valuable insights and inspiration for the future of multimodal research.

I’m deeply grateful for the opportunity to present our work with my amazing…

6.0K

Wenbo Hu Retweeted

Yu Yang@YuYang_i · Dec 9

1/ I'll be at #NeurIPS2024 presenting our work SmallToLarge (S2L): Data-efficient Fine-tuning of LLMs! 🚀 What’s S2L? It’s a scalable data selection method that trains a small proxy model to guide fine-tuning for larger models, reducing costs while preserving performance. 👇

136

20.0K

Wenbo Hu@gordonhu608 · Dec 9

I'll be at #NeurIPS Vancouver between 12/9 and 12/13. Presenting this work on Thursday 4:30pm - 7:30pm at East Exhibit Hall A-C #3509. Welcome old and new friends to chat on multimodal AI research and more! My DM is open :)

WWenbo Hu@gordonhu608 · May 30, 2024

How to pick a good number of visual tokens? Too few, you have poor performance; too many, you need quadratically more compute. In this work, we introduce a model that works with an elastic number of tokens. arXiv: arxiv.org/abs/2405.19315

2.0K

Wenbo Hu@gordonhu608 · Dec 7

Collaborating between #NVIDIA and #UCSD, we build NaVILA, the foundational navigation VLA for humanoids and quadrupeds. This is enabled by a 2-level framework, a direction I am pushing a lot these days: 1⃣ A VLA that outputs mid-level actions, like "turn left 15 degrees". 2⃣ A…

YYandong Ji@JiYandong · Dec 6

Without any maps and prior knowledge of the scene, our humanoid and quadruped can now navigate with human language instructions to anywhere outdoors and in any house we go!🔥🔥🔥 Introducing NaVILA, a 2-level navigation foundation model (mid-level action VLA + locomotion skills)…

121

14.0K

Wenbo Hu Retweeted

Cheng-Fu Joey Yang@cfyang58 · Dec 6

📣 New Paper: Verbalized Representation Learning (VRL) VRL bridges prompt engineering and representation learning to enable automatic interpretable feature extraction — all without gradient descent! 🔥 +29% over SOTA 📊 95% less data arxiv.org/abs/2411.18651 @uclanlp (1/n)

5.0K

Wenbo Hu Retweeted

Xueqing Wu@xueqing_w · Dec 5

Can VLMs improve 𝘁𝗵𝗲𝗺𝘀𝗲𝗹𝘃𝗲𝘀💪? We propose🔥𝗩𝗜𝗦𝗖𝗢, a benchmark to evaluate VLMs’ 𝗰𝗿𝗶𝘁𝗶𝗾𝘂𝗲 and 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗶𝗼𝗻 capabilities, towards the higher goal of VLMs autonomous self-improvement. 🌐Project: visco-benchmark.github.io 📄Paper: arxiv.org/abs/2412.02172

132

20.0K

Wenbo Hu Retweeted

Manling Li@ManlingLi_ · Nov 6

[NeurIPS D&B Oral] Embodied Agent Interface: Benchmarking LLMs for Embodied Agents A single line of code to evaluate your model! 🌟Standardize Goal Specifications: LTL 🌟Standardize Modules and Interfaces: 4 modules, 438 tasks, 1475 goals 🌟Standardize Fine-grained Metrics: 18…

282

133

132.0K

Wenbo Hu Retweeted

Yining Hong@yining_hong · Oct 31

🎬Meet SlowFast-VGen: an action-conditioned long video generation system that learns like a human brain! 🧠Slow learning builds the world model, while fast learning captures memories - enabling incredibly long, consistent videos that respond to your actions in real-time.…

165

41.0K