Zhengyang Geng
@ZhengyangGeng
PhD student @SCSatCMU with @zicokolter / curiosity&love / dynamics to super intelligence
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,…

Thrilled to introduce AlphaGenome, our new DNA sequence model now available via our AlphaGenome API. Really excited to see how the scientific community uses AlphaGenome’s predictions to understand genome function, drive biological discoveries, develop new treatments, and more...
Introducing AlphaGenome: an AI model to help scientists better understand our DNA – the instruction manual for life 🧬 Researchers can now quickly predict what impact genetic changes could have - helping to generate new hypotheses and drive biological discoveries. ↓
I will present the Diff-Instruct-star poster paper at Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT, East Exhibition Hall A-B #E-1808. Feel free to join and chat about one-step text-to-image/video models at scales!
Diffentiable🤫 Dense🤫 E2E🤫
We should have called it "scaling up rollout", not RL. RL is a necessary evil for the discrete nature of language. My intuition tells me using RL for continuous data (images, videos, audios), where differentiable supervision is easily available, is a terrible idea.
🚀 Last weekend at Peking University, I worked with Yifei @WangYw251 and developed Easy Meanflow (github.com/pkulwj1994/eas…), an open-sourced Pytorch DDP implementation of MeanFlow (a phenomenal paper by my bro @ZhengyangGeng, Mingyang Deng, Xingjian Bai, J. Zico Kolter, and…
now wouldn't that be something...
Let me play a video game of my veo 3 videos already. Google cooked so good 👌 @OfficialLoganK playable world models wen?
now the code is up here: github.com/Gsunshine/mean…
Excited to share our work with my amazing collaborators, @Goodeat258, @SimulatedAnneal, @zicokolter, and Kaiming. In a word, we show an “identity learning” approach for generative modeling, by relating the instantaneous/average velocity in an identity. The resulting model,…
Thanks, @CSProfKGD! I love MeanFlow's elegant formulation of one-step generative modeling. But I was a bit confused about the notation and derivation. Hopefully, this video will help people interested in the paper understand it better.
Fresh out of the oven! 🍞 @jbhuang0604 breaks down Mean Flow from Kaiming’s group in his latest video.
Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.
As I was saying: it's happening
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
Is a 1-step & 1k-token text model far?
We’ve developed Gemini Diffusion: our state-of-the-art text diffusion model. Instead of predicting text directly, it learns to generate outputs by refining noise, step-by-step. This helps it excel at coding and math, where it can iterate over solutions quickly. #GoogleIO
Your LLMs can literally attend Tsinghua University. Can they graduate?
#ICML2025 arxiv.org/abs/2505.02018 We surveyed 100+ courses across 19 departments at Tsinghua University. With expert and model filtering, we curated a graduate-level, Olympiad-difficulty, multi-disciplinary benchmark R-Bench. Even GPT-4o struggles (33.4% on multimodal)!
Why can foundation models transfer to so many downstream tasks? Will the scaling law end? Will pretraining end like Ilya Sutskever predicted? My PhD thesis builds the contexture theory to answer the above. Blog: runtianzhai.com/thesis Paper: arxiv.org/abs/2504.19792 🧵1/12
✨ Love 4o-style image generation but prefer to use Midjourney? Tired of manual prompt crafting from inspo images? PRISM to the rescue! 🖼️→📝→🖼️ We automate black-box prompt engineering—no training, no embeddings, just accurate, readable prompts from your inspo images! 1/🧵
This ICLR is the best conference ever. Attendees are extremely friendly and cuddly. ..What do you mean this is the wrong hall?