Ming-Yu Liu (@liu_mingyu)

Pinned

M

Ming-Yu Liu@liu_mingyu · Jun 12

For people looking for a diffusion-based video generator to finetune or post-train for their downstream physical AI applications, we just released our latest one. We have 2 models: 2B and 14B. 2B for fast prototyping and 14B for better quality. The license is fully open. Give it…

QQinsheng Zhang@qsh_zh · Jun 12

🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly…

2

10

46

16

5.0K

Ming-Yu Liu Retweeted

D

Daniel Ho@itsdanielho · Jul 17

We at @1x_tech with @JackMonas are excited to announce the ICCV phase of our 1X World Model Challenge: huggingface.co/spaces/1x-tech… Participate in the Compression and Sampling tracks for a $8k prize pool & train generative models for cool robot results like: 1x.tech/discover/redwo…

5

73

34

8.0K

Ming-Yu Liu Retweeted

S

Sean Kirmani@SeanKirmani · Jul 3

🤖🌎 We are organizing a workshop on Robotics World Modeling at @corl_conf 2025! We have an excellent group of speakers and panelists, and are inviting you to submit your papers with a July 13 deadline. Website: robot-world-modeling.github.io

3

36

130

49

41.0K

Ming-Yu Liu Retweeted

H

Hanzi Mao@hanna_mao · Jun 28

We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough…

8

72

283

111

28.0K

M

Ming-Yu Liu@liu_mingyu · Jun 18

Big congrats to @ericjang11 and the team on the 1X World Model release. Verification is an important part of producing production AI model. Given the diverse nature of the work environment, it makes a lot of sense to leverage a world model to help with policy evaluation.

EEric Jang@ericjang11 · Jun 16

We've made substantial progress on our action-conditioned video generation model, aka the "1X World Model", and we show that we can use it to evaluate robot policies instead of running experiments in the real world. Check it out!

1

0

13

3

2.0K

M

Ming-Yu Liu@liu_mingyu · Jun 18

Check out our latest HF demo on 3D generation with part annotation.

VVictor M@victormustar · Jun 16

Nvidia cooked with PartPacker 3D Generation A new method to create 3D objects from a single image, with each part separate and easy to edit 🔥 ⬇️ Demo available on Hugging Face

1

0

6

0

1.0K

M

Ming-Yu Liu@liu_mingyu · Jun 14

3D asset generation has advanced a lot in the past few years. Generating a holistic 3D asset is no longer a challenging problem. What's next for 3D generation? We believe that generating a 3D asset with individual parts defined is the next frontier. With the parts, we can start…

kkiui@ashawkey3 · Jun 13

Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…

0

1

22

2

2.0K

Ming-Yu Liu Retweeted

k

kiui@ashawkey3 · Jun 13

Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…

0

16

73

30

12.0K

M

Ming-Yu Liu@liu_mingyu · Jun 13

Generating 3D models with parts is a key step toward scalable, interactive simulation environments. Check out our work — PartPacker — and the concurrent project, PartCrafter!" PartPacker: github.com/NVlabs/PartPac… PartCrafter: wgsxm.github.io/projects/partc…

kkiui@ashawkey3 · Jun 13

Happy to share our work PartPacker: We enable one-shot image-to-3D generation with any number of parts! Project page: research.nvidia.com/labs/dir/partp… Demo: huggingface.co/spaces/nvidia/… Code: github.com/NVlabs/PartPac…

2

14

72

17

5.0K

M

Ming-Yu Liu@liu_mingyu · Jun 11

We post-trained a reasoning model to reason whether a video is real or generated. It might be very useful as a critic to improve video generators. Take a look. @NVIDIAAI

MMax Zhaoshuo Li 李赵硕@mli0603 · Jun 11

Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)

0

4

36

7

3.0K

Ming-Yu Liu Retweeted

M

Max Zhaoshuo Li 李赵硕@mli0603 · Jun 11

Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: arxiv.org/abs/2503.15558 Huggingface: huggingface.co/nvidia/Cosmos-… Code: github.com/nvidia-cosmos/… Project page: research.nvidia.com/labs/dir/cosmo… (1/n)

2

32

99

29

12.0K

M

Ming-Yu Liu@liu_mingyu · May 22

Check out our new work on Direct Discriminative Optimization improving GenAI models.

KKaiwen Zheng@zkwthu · May 22

1/💡New paper from NVIDIA&Tsinghua @ICML2025 Spotlight! Direct Discriminative Optimization (DDO) enables GAN-style finetuning of diffusion/autoregressive models without extra networks. SOTA achieved on ImageNet-512! Website: research.nvidia.com/labs/dir/ddo/ Code: github.com/NVlabs/DDO

0

1

16

4

2.0K

Ming-Yu Liu Retweeted

K

Kaiwen Zheng@zkwthu · May 22

1/💡New paper from NVIDIA&Tsinghua @ICML2025 Spotlight! Direct Discriminative Optimization (DDO) enables GAN-style finetuning of diffusion/autoregressive models without extra networks. SOTA achieved on ImageNet-512! Website: research.nvidia.com/labs/dir/ddo/ Code: github.com/NVlabs/DDO

3

14

52

21

8.0K

M

Ming-Yu Liu@liu_mingyu · May 20

We released Cosmos-Reason1 code, model, and part of the data! We also updated our paper to include a section about our RL infra: arxiv.org/abs/2503.15558 - Code: github.com/nvidia-cosmos/… - Model and Data: huggingface.co/collections/nv… - Blog: developer.nvidia.com/blog/curating-…

YYin Cui@YinCuiCV · Mar 24

Is the video playing forward or backward? None of the current AI models can answer this simple question correctly.

0

19

113

36

13.0K

Ming-Yu Liu Retweeted

A

AK@_akhaliq · Apr 23

Nvidia just dropped Describe Anything on Hugging Face Detailed Localized Image and Video Captioning

7

159

914

727

104.0K

Ming-Yu Liu Retweeted

H

Humphrey Shi@humphrey_shi · Apr 24

Over 4 years into our journey bridging Convolutions and Transformers, we introduce Generalized Neighborhood Attention—Multi-dimensional Sparse Attention at the Speed of Light: github.com/SHI-Labs/NATTEN A collaboration with the best minds in AI and HPC. 🐝🟩🟧 @gtcomputing @nvidia

0

31

126

81

13.0K

Ming-Yu Liu Retweeted

Y

Yin Cui@YinCuiCV · Apr 23

Introducing the Describe Anything Model (DAM), a powerful Multimodal LLM that generates detailed descriptions for user-specified regions in images or videos using points, boxes, scribbles, or masks. Open-source code, models, demo, data, and benchmark at: describe-anything.github.io

8

80

407

270

33.0K

Ming-Yu Liu Retweeted

Y

Yin Cui@YinCuiCV · Mar 24

Is the video playing forward or backward? None of the current AI models can answer this simple question correctly.

1

14

121

25

33.0K

M

Ming-Yu Liu@liu_mingyu · Mar 20

Video/Physics Generative AI was bottlenecked by diffusion runtime— 5s used to take minutes. My student @AliHassaniJr @gtcomputing helped scale full 35-step Cosmos 7B DiT 40× to real-time on Blackwell NVL72, in collab w/ @nvidia @liu_mingyu’s team. Congrats—just the beginning!🐝🚀

AAK@_akhaliq · Mar 19

Nvidia just released Cosmos-Transfer1 on Hugging Face Conditional World Generation with Adaptive Multimodal Control

1

9

75

25

19.0K

Ming-Yu Liu Retweeted

A

AK@_akhaliq · Mar 19

Nvidia just released Cosmos-Transfer1 on Hugging Face Conditional World Generation with Adaptive Multimodal Control

8

99

518

253

92.0K