Xun Huang
@xunhuang1995
Interactive Video World Model @AdobeResearch, Adjunct Professor @CarnegieMellon, ex-@NVIDIAAI, Ph.D. @Cornell, Snap & NVIDIA & Adobe Fellowship Recipient.
Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.
Btw as an aside, we didn’t announce on Friday because we respected the IMO Board's original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightly received the acclamation they deserved
We should have called it "scaling up rollout", not RL. RL is a necessary evil for the discrete nature of language. My intuition tells me using RL for continuous data (images, videos, audios), where differentiable supervision is easily available, is a terrible idea.
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Welcome to the Era of Real-Time Experience
Real-time video generation is arriving faster than most realize. StreamDiT, CausVideo, Self-Forcing, and Seaweed APT2 represent a new model architecture that enables temporal consistency and fast frame rate. A new era is emerging
Wow, nearly half a million videos have been created with CausVid! Huge thanks to @multimodalart and the incredible open-source community for expanding the codebase and building such impressive demos!
(for the curious ones, 1K likes to this Space translates to almost half a million videos generated on the Space! 🤯 go check it out: huggingface.co/spaces/multimo…)
🚀 CausVid (causvid.github.io) is powering the first real-time, audio-driven AI avatars at @character_ai — amazing work! Real-time video models open the door to countless interactive experiences. Excited to see what comes next! blog.character.ai/character-ais-…
Self-Forcingが強力すぎる 動画のリアルタイム生成だけでなく言語モデルや音声、強化学習が劇的に高性能化する可能性|shi3z @shi3z note.com/shi3zblog/n/n8…
this is not a drill 🚨, real-time open source video generation is here 🔥 Self-Forcing - a real-time video distilled model from Wan 2.1 by @Adobe is out, and they open sourced it 🐐 I've built a live real time demo on @huggingface Spaces 📹💨
🚀 Excited to introduce our latest work GRESO: a method that identifies and skips millions of uninformative prompts before rollout and achieves up to 2.0x wall-clock time speedup in training. More rollouts lead to better model performance, but they’re also a major bottleneck in…
NVIDIA wants to sell you NVL72 rack ($3M) so you can do real-time video generation 😅 Good thing: you don't need it. Self Forcing does the job with one 4090, and with better quality 😊 self-forcing.github.io

We built a real-time audio-video world model (& showed it off at CVPR)! 🎥360p 🔊W/ synced audio ⚡️10fps on a gaming laptop (faster than our H100s!) 📦image+audio VAE & a causal diffusion WM 🔥 built and trained in under 72 hours! See below for our technical blog 1/4 🧵
Hello @MiniMax__AI exciting model but questionable claim on its better reasoning scaling than @deepseek_ai and @Alibaba_Qwen. Nice try on reasoning longer to be SOTA but using flops to quantify the cost in Test-time scaling doesn’t work for hybrid model 🫣 @chenzhuoming911 has…
Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:…
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: multiverse4fm.github.io 🧵 1/n