Hanyang Chen
@hc81Jeremy
M.S in CS @ UIUC ; BSc @ HKU. Do cool things with cool people. #Multimodal_AI #AI_Reasoning
Grateful for the chance to present EmbodiedBench at ICML as an Oral. A rewarding experience full of learning. Thanks for @RuiYang70669025 @hengjinlp @jyzhang1208 @huan_zhang12 Mark_Zhao @ManlingLi_ Tong_Zhang and many others who make it possible. See you next time.




Thanks Prof. Yu to be at our talk @Zhou_Yu_AI. Happy ICML25🇨🇦
Good work, enjoyed the talk.
My coauthor @hc81Jeremy will present EmbodiedBench at ICML 2025! 🤖 Oral Session 6A 📍 West Hall C 🕧July 17 3:30-3:45 pmPDT 📌 Poster Session 📍 East Hall A-B #E-2411🕜 July 17 4:30-7 pm PDT Come say hi and let’s talk about VLM agent training, evaluation, and benchmarking! 😀
I'll be at ICML next week, presenting our paper on Wasserstein Policy Optimization on Tuesday! If you're in Vancouver, come say hi!
Excited to share that EmbodiedBench was selected for an Oral at ICML 2025! We recently added results for new models (InternVL3, Gemma3, Ovis2) and released a large agent trajectory dataset on 🤗: embodiedbench.github.io Try training and evaluating your MLLM for embodied agent!

How to schedule a meeting? When you ask for a meeting with others, you are asking for their time. You are asking for their most valuable, finite resource to benefit yourself (e.g., for advice, networking, questions, and opportunities). Here are some tips that I found useful.
🚀 Excited to share our latest work on Iterative-DPO for math reasoning! Inspired by DeepSeek-R1 & rule-based PPO, we trained Qwen2.5-MATH-7B on Numina-Math prompts. Our model achieves 47.0% pass@1 on AIME24, MATH500, AMC, Minerva-Math, OlympiadBench—outperforming…
Thanks, Manling, for sharing the work! Improve your VLM with EmbodiedBench.
Excited to release EmbodiedBench for VLMs! It is time to work on embodied agents using VLMs🔥 embodiedbench.github.io 🔍 1,128 tasks across 4 diverse environments 🎯 6 fine-grained evaluation capabilities (reasoning, planning, perception & more) 📊 Benchmarked on 13 top…
🔥Exploring MLLM as Embodied Generalist. 🔍 4 diverse tasks - from High Level Planning to Low Level Manipulation 🎯 6 fine-grained evaluation capabilities ALL IN ONE MLLM. 📊 More than a Benchmark - A standardized platform for more algorithms to sparks.
🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation & manipulation? 🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents! 📄 Paper: arxiv.org/abs/2502.09560 🌐 Website & code: embodiedbench.github.io
EmbodiedBench is a powerful testbed for evaluating MLLM agents' reasoning, planning, and spatial understanding capabilities! 🚀 Check our code at: github.com/EmbodiedBench/… #AI #EmbodiedAI #MLLMs #Robotics #Benchmarking #Multimodal #Reasoning
🔍 What impacts MLLM agent performance? ✅ Optimal camera resolution boosts performance ✅ Detection boxes improve perception ❌ Multi-step/multiview visual inputs cause confusion ✅ Visual in-context learning outperforms text-based ICL! Check more error analysis in our paper!
📊 Key Takeaways: 1️⃣ MLLMs struggle with low-level manipulation tasks. 2️⃣ Vision is crucial for mastering low-level tasks. 3️⃣ Claude leads in high-level tasks, while GPT-4o leads in low-level tasks. 4️⃣ InternVL2.5-78B is the best open-source model overall! 🚀
🔥 What makes EmbodiedBench unique? 🔍 1,128 tasks across 4 diverse environments 🎯 6 fine-grained evaluation capabilities (reasoning, planning, perception & more) 📊 Benchmarked on 13 top MLLMs, including GPT-4o, Claude, Gemini, LLaMA, QwenVL2, InternVL2.5! One example below
🤖Can MLLM agents reason about spatial relationships and plan atomic actions for navigation & manipulation? 🔥 Meet EmbodiedBench 🏆—the first fine-grained benchmark for MLLM-based embodied agents! 📄 Paper: arxiv.org/abs/2502.09560 🌐 Website & code: embodiedbench.github.io
Huge thanks to my amazing collaborators: @hc81Jeremy, @jyzhang1208, @ZihaoZH94437841, @qiancheng1231, @James_KKW, @qineng_wang,@ManlingLi_, @hengjinlp,@huan_zhang12