Yin Cui
@YinCuiCV
Research Scientist @NVIDIA | Formerly @Google, @Cornell | Views are my own
Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Excited to share that Describe Anything has been accepted at ICCV 2025! 🎉 Describe Anything Model (DAM) is a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code,…
Nvidia just dropped Describe Anything on Hugging Face Detailed Localized Image and Video Captioning
We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough…
Introducing Gemini CLI, a light and powerful open-source AI agent that brings Gemini directly into your terminal. >_ Write code, debug, and automate tasks with Gemini 2.5 Pro with industry-leading high usage limits at no cost.
🎉 ComfyUI now natively supports NVIDIA’s Cosmos-Predict2 model family! Cosmos-Predict2 brings high-fidelity, physics-aware Image generation and Video2World (Image-to-Video) generation. Another reality inside ComfyUI!
🚀 We're releasing Cosmos-Predict2 — our developer-first, top-performing world foundation models for Physical AI! 🔗 huggingface.co/blog/nvidia/co… 👩💻 Pretrained weights, inference, and post-training scripts available. 💬 Try it out and share your feedback! - code:…
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly…
Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…
Cosmos-Predict2 is our latest open video foundation model for Physical AI! research.nvidia.com/labs/dir/cosmo… If you’re at #cvpr2025, I would also love to chat with you about world models!
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly…
The physics meets vision workshop just started! Come joining us!
Join us on the 1st workshop on Vision Meets Physics: Synergizing Physical Simulation and Computer Vision at #CVPR2025 tomorrow! Thought-provoking talks and expert insights from leading researchers that YOU CANNOT MISS! 📍104A ⏰ 8:45am June 12th visionmeetphysics.github.io
The WorldModelBench workshop is happening tomorrow (June 12th) at #CVPR2025! We have an exciting series of talks, do attend! Place: Room 108 Time: Morning Session #NVIDIAResearch
Join us at the WorldModelBench workshop at #CVPR2025 where we'll tackle systematic evaluation of World Models! Focus: benchmarks, metrics, downstream tasks, and safety. Submit papers now: worldmodelbench.github.io
If you are attending #CVPR2025 tomorrow, please visit two highly relevant workshops organized by our team members: - Vision Meets Physics: visionmeetphysics.github.io - Benchmarking World Models: worldmodelbench.github.io
many core-contributors are attending #CVPR2025 . Let’s discuss the future of world models!
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly…
Join us on the 1st workshop on Vision Meets Physics: Synergizing Physical Simulation and Computer Vision at #CVPR2025 tomorrow! Thought-provoking talks and expert insights from leading researchers that YOU CANNOT MISS! 📍104A ⏰ 8:45am June 12th visionmeetphysics.github.io
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)