Ming-Yu Liu
@liu_mingyu
Tweets are my own.
For people looking for a diffusion-based video generator to finetune or post-train for their downstream physical AI applications, we just released our latest one. We have 2 models: 2B and 14B. 2B for fast prototyping and 14B for better quality. The license is fully open. Give it…
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly…
We at @1x_tech with @JackMonas are excited to announce the ICCV phase of our 1X World Model Challenge: huggingface.co/spaces/1x-tech… Participate in the Compression and Sampling tracks for a $8k prize pool & train generative models for cool robot results like: 1x.tech/discover/redwo…
🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025! We have an excellent group of speakers and panelists, and are inviting you to submit your papers with a July 13 deadline. Website: robot-world-modeling.github.io
We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough…
Big congrats to @ericjang11 and the team on the 1X World Model release. Verification is an important part of producing production AI model. Given the diverse nature of the work environment, it makes a lot of sense to leverage a world model to help with policy evaluation.
We've made substantial progress on our action-conditioned video generation model, aka the "1X World Model", and we show that we can use it to evaluate robot policies instead of running experiments in the real world. Check it out!
Check out our latest HF demo on 3D generation with part annotation.
Nvidia cooked with PartPacker 3D Generation A new method to create 3D objects from a single image, with each part separate and easy to edit 🔥 ⬇️ Demo available on Hugging Face
3D asset generation has advanced a lot in the past few years. Generating a holistic 3D asset is no longer a challenging problem. What's next for 3D generation? We believe that generating a 3D asset with individual parts defined is the next frontier. With the parts, we can start…
Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…
Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…
Generating 3D models with parts is a key step toward scalable, interactive simulation environments. Check out our work — PartPacker — and the concurrent project, PartCrafter!" PartPacker: github.com/NVlabs/PartPac… PartCrafter: wgsxm.github.io/projects/partc…
Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…
We post-trained a reasoning model to reason whether a video is real or generated. It might be very useful as a critic to improve video generators. Take a look. @NVIDIAAI
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)
Check out our new work on Direct Discriminative Optimization improving GenAI models.
1/💡New paper from NVIDIA&Tsinghua @ICML2025 Spotlight! Direct Discriminative Optimization (DDO) enables GAN-style finetuning of diffusion/autoregressive models without extra networks. SOTA achieved on ImageNet-512! Website: research.nvidia.com/labs/dir/ddo/ Code: github.com/NVlabs/DDO
1/💡New paper from NVIDIA&Tsinghua @ICML2025 Spotlight! Direct Discriminative Optimization (DDO) enables GAN-style finetuning of diffusion/autoregressive models without extra networks. SOTA achieved on ImageNet-512! Website: research.nvidia.com/labs/dir/ddo/ Code: github.com/NVlabs/DDO
We released Cosmos-Reason1 code, model, and part of the data! We also updated our paper to include a section about our RL infra: arxiv.org/abs/2503.15558 - Code: github.com/nvidia-cosmos/… - Model and Data: huggingface.co/collections/nv… - Blog: developer.nvidia.com/blog/curating-…
Is the video playing forward or backward? None of the current AI models can answer this simple question correctly.
Nvidia just dropped Describe Anything on Hugging Face Detailed Localized Image and Video Captioning
Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: github.com/SHI-Labs/NATTEN A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia
Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io
Is the video playing forward or backward? None of the current AI models can answer this simple question correctly.
Video/Physics Generative AI was bottlenecked by diffusion runtime— 5s used to take minutes. My student @AliHassaniJr @gtcomputing helped scale full 35-step Cosmos 7B DiT 40× to real-time on Blackwell NVL72, in collab w/ @nvidia @liu_mingyu’s team. Congrats—just the beginning!🐝🚀
Nvidia just released Cosmos-Transfer1 on Hugging Face Conditional World Generation with Adaptive Multimodal Control
Nvidia just released Cosmos-Transfer1 on Hugging Face Conditional World Generation with Adaptive Multimodal Control