Yuncong Yang
@YuncongYY
First-year CS PhD student at UMass Amherst, advised by @gan_chuang | Intern @MSFTResearch
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
Just paid ¥4.99 to a site that "predicts" NeurIPS acceptance from your ratings and confidence scores. Total scam-basically a random number generator. 🤡 I should build my own startup for this. Pretty sure I could make a fortune off researchers' anxiety these days. #NeurIPS2025

VLM struggles badly to interpret 3D from 2D observations, but what if it has a good mental model about the world? Checkout our MindJourney - A test-time scaling for spatial reasoning in 3D world. Without any specific training, MindJourney imagines (acts mentally) step-by-step…
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
Spatial reasoning from a single image is inherently difficult, but it becomes significantly easier when leveraging a controlled world model, analogous to the mental models used by humans! Code: github.com/UMass-Embodied…
Test-time scaling nailed code & math—next stop: the real 3D world. 🌍 MindJourney pairs any VLM with a video-diffusion World Model, letting it explore an imagined scene before answering. One frame becomes a tour—and the tour leads to new SOTA in spatial reasoning. 🚀 🧵1/
VLMs often struggle with physical reasoning tasks such as spatial reasoning. Excited to share how we can use world models + test-time search to zero-shot improve spatial reasoning in VLMs!
MindJourney Test-Time Scaling with World Models for Spatial Reasoning
Thanks @_akhaliq for sharing our work! MindJourney fuses a world model with any VLM, so the model can first imagine walking around before it answers. From “one snapshot” to “what if I stand over there?”—and suddenly spatial reasoning hits SOTA. 🚀 Project Page:…
MindJourney Test-Time Scaling with World Models for Spatial Reasoning
You can install anycoder as a Progressive Web App on your device. Visit huggingface.co/spaces/akhaliq… and in the footer click settings then follow instructions and click the install button in the URL address bar of your browser
📣 Excited to announce SpaVLE: #NeurIPS2025 Workshop on Space in Vision, Language, and Embodied AI! 👉 …vision-language-embodied-ai.github.io 🦾Co-organized with an incredible team → @fredahshi · @maojiayuan · @DJiafei · @ManlingLi_ · David Hsu · @Kordjamshidi 🌌 Why Space & SpaVLE? We…
I hope humans and robots live peacefully in the Virtual Community. Great work by @QinhongZhou ! #DetroitBecomeHuman #AI #Robotics
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or…
Nashville’s food is hands-down the highlight of CVPR for me so far. Sending a meat-lover’s salute to the South 🤤 P1 Hattie B P2 Peg Leg Porker #CVPR2025


Watched the notorious @celtics game while working on my NeurIPS submission. It took me 2½ hours to realized there’s something even more painful than rushing a NeurIPS paper. #Celtics #NeurIPS2025
TesserAct is out on Hugging Face Learning 4D Embodied World Models