Yuanzhi
@yuanzhi_zhu
PhD Student in École Polytechnique Currently interested in CV, ML. Opinions are my own.
Masked Diffusion Models (MDMs) are a hot topic in generative AI 🔥 — powerful but slow due to multiple sampling steps. We @Polytechnique and @Inria introduce Di[M]O — a novel approach to distill MDMs into a one-step generator without sacrificing quality.
Movies are more than just video clips, they are stories! 🎬 We’re hosting the 1st SLoMO Workshop at #ICCV2025 to discuss Story-Level Movie Understanding & Audio Descriptions! Website: slomo-workshop.github.io Competition: huggingface.co/spaces/SLoMO-W…
(Human level) Intelligence does not emerge from purely visual observations. Cats and dogs are capable of exploring the world, still they do not behave intelligently. While LLM could be merely a shortcut, enable it with exploration is the most promising way.
I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/language-mod…
"RL with only one training example" and "Test-Time RL" are two recent papers that I found fascinating. In the "One Training example" paper the authors find one question and ask the model to solve it again and again. Every time, the model tries 8 times (the Group in GRPO), and…
Concurrent to @ShunyuYao12 amazing blog on the second half of AI, I recent found that another OAI employee also plotted the same directions for the second half at nonint.com/2025/03/16/the… consensus in OAI?

📢 Next Talk on Rectified Flow! Speaker: Qiang Liu (UT Austin) 🗓️ April 22nd ⏰ 8:30 am PT | 11:30 am ET | 4:30 pm London | 5:30 pm Paris | 11:30 pm Beijing Zoom: us06web.zoom.us/webinar/tZYtdO… 🔍 More details: rectifiedflow.github.io
How much do the video generation models really understand physics? We build a novel benchmark inspired by the intrinsic physical properties of dynamical systems: the conservation laws. Please check our latest work if you are interested 🥳⚛️
Preprint of today: Zhang, Cherniavskii, Zadaianchuk, Tragoudaras, et al., "MORPHEUS: Benchmarking Physical Reasoning of Video Generative Models with Real Physical Experiments" -- physics-from-video.github.io/morpheus-bench/ Can we use off-the-shelf video models to simulate physics? Perhaps not yet.
I finally wrote another blogpost: ysymyth.github.io/The-Second-Hal… AI just keeps getting better over time, but NOW is a special moment that i call “the halftime”. Before it, training > eval. After it, eval > training. The reason: RL finally works. Lmk ur feedback so I’ll polish it.
Halfmoon is Reve Image — and it’s the best image model in the world 🥇 (🔊)