Ian Huang
@IanHuang3D
AI PhD @StanfordAILab | Ex-SR @GoogleDeepMind Ex-SR @RealityLabs | Multimodal models for 3D | http://ianhuang.ai
🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇
New blog post about asymmetry of verification and "verifier's law": jasonwei.net/blog/asymmetry… Asymmetry of verification–the idea that some tasks are much easier to verify than to solve–is becoming an important idea as we have RL that finally works generally. Great examples of…
We are presenting 17:00-19:00 today at Poster 267 in ExHall D for #CVPR25! Come and check out the first #VLM #3D #Graphics Benchmark! 📣📣📣
Which multimodal LLM should you be using to edit graphics in Blender? Today, we’re releasing our #CVPR2025 Highlight🌟 work, #BlenderGym 🏋️♀️, the first agentic 3D graphics editing benchmark that will tell you exactly how multimodal LLMs compare in their Blender-editing skills.…
📣Happening at Poster #269 TODAY between 10:30am - 12:30am at #CVPR2025 ! Come learn about multimodal #VLM reasoning for #3D scene generation!
🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇
If you're wondering which multimodal LLMs you should be using to build 3D graphics agents 🧑💻 , check out our #CVPR2025 Highlight work, BlenderGym -- not only does BlenderGym benchmark the top open and closed models, it also reveals a trick about *how* you should be allocating…
Which multimodal LLM should you be using to edit graphics in Blender? Today, we’re releasing our #CVPR2025 Highlight🌟 work, #BlenderGym 🏋️♀️, the first agentic 3D graphics editing benchmark that will tell you exactly how multimodal LLMs compare in their Blender-editing skills.…
Excited to share our work: Gaussian Mixture Flow Matching Models (GMFlow) github.com/lakonik/gmflow GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.
📣 Happy to share that FirePlace got a #CVPR2025 Highlight ! See you all in Nashville! 🎷
🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇
Impressive demo and work! Amazing stuff going on with MLLMs.
🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇
Thanks for sharing our work! And yes — sound ON is a good idea.
🚨CVPR 2025 Paper Alert 🚨 ➡️Paper Title: FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement 🌟Few pointers from the paper 🎯Scene generation with 3D assets presents a complex challenge, requiring both high-level semantic understanding and…
y'all wanted Ghibli, so here it is. FirePlace 3D scene 🏠 -> render 📸-> ChatGPT4o + prompting 🎨 fireplace3d.github.io
🏡Building realistic 3D scenes just got smarter! Introducing our #CVPR2025 work, 🔥FirePlace, a framework that enables Multimodal LLMs to automatically generate realistic and geometrically valid placements for objects into complex 3D scenes. How does it work?🧵👇
neat looking pipeline for mixing vision LLM with 3d objects simply asks the MLLM stuff like object size, rotation, surface alignment constraints, etc and then applies them to 3d scene overlayed on the image & anchor object
FirePlace Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
FirePlace Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement