Chongxuan Li
@LiChongxuan
Associate Professor at RUC. Deep Generative Models; Machine Learning. Analytic-DPM, DPM-Solver, ProlificDreamer, U-ViT, and LLaDA.
🚀【Large Language Diffusion Models】#DiffusionModels #LLM #LLaDA We built LLaDA-8B—the FIRST non-autoregressive model rivaling LLaMA3! CRUSHES Llama2-7B on ~20 tasks while unlocking ICL/instruction-following/multi-turn chat




LLaDA (the first Large Language Diffusion Model) is *just* out 💥 and I've built a demo, try out now 👨💻 It's mesmerizing to watch the diffusion process 🌀, and it being a diffusion model gives you superpowers like "the 4th word has to be pineapple" 🦸 Demo and weights 👇
nice work!
Please check out our paper and code for more details: Paper: huggingface.co/papers/2505.21… Code: github.com/sail-sg/VeriFr… Joint work with amazing collaborators @NickZhou523786, @anyaasims, @Haonan_Wang_ , @TianyuPang1 , @LiChongxuan , Liang Wang, @mavenlin , @duchao0726 !
Thanks for sharing!
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models "We demonstrate the effectiveness of VRPO by applying it to LLaDA, and the resulting model, LLaDA 1.5, outperforms its SFT-only predecessor consistently and significantly across mathematical…
Scaling Diffusion Transformers Efficiently via μP "In this work, we generalize standard μP to diffusion Transformers and validate its effectiveness through large-scale experiments. " "In both cases, models under μP outperform their respective baselines while requiring small…
thanks for sharing our work!
LLaDA-V Large Language Diffusion Models with Visual Instruction Tuning
🚀 Understanding R1-Zero-Like Training 🪂 Wait… 🤯 “Aha moment” already exists in DeepSeek-V3-Base before RL-tuning? 📏 RL-tuned output length keeps growing — GRPO bias?? 🥇 Fix the bias → 7B AIME SOTA. 📜More in our new paper: github.com/sail-sg/unders…
🪂Understanding R1-Zero-Like Training: A Critical Perspective * DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning?? * The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO?? * Getting GRPO Done Right, we achieve a 7B AIME sota! 🧵 📜Full…
Nice work by Luxi and Zihan!
We are excited to introduce FlexWorld, a framework capable of generating 3D scenes from a single image that supports flexible viewpoint navigation, including 360° rotation and zooming. Code and model weights are open-source—try it out! Project Page:ml-gsai.github.io/FlexWorld/
Effective and Efficient Masked Image Generation Models "We propose a unified framework integrating masked image modeling and masked diffusion models" "on ImageNet 256×256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal…
We have released the code and models for LLaDA at github.com/ML-GSAI/LLaDA Thanks Shen for providing a very detailed tutorial on how to train your own llada and FAQs.
RIFLEx A Free Lunch for Length Extrapolation in Video Diffusion Transformers TL;DR: Effortlessly extend your video with just one line of code: freq[k-1]=(2*np.pi)/(L*s).
😯There may not be 𝗔𝗵𝗮 𝗠𝗼𝗺𝗲𝗻𝘁 in R1-Zero-like training! We observe (superficial) self-reflection patterns in base models and investigate the RL mechanism in R1-Zero-like training. 📚Read more: oatllm.notion.site/oat-zero ⚒️Code: github.com/sail-sg/oat-ze… More results will…
🚨There May Not be Aha Moment in R1-Zero-like Training: oatllm.notion.site/oat-zero A common belief about the recent R1-Zero-like training is that self-reflections *emerge* as a result of RL training. We carefully investigated and showed the opposite. 🧵
Thank you Yisong and the Award Committee for choosing the VAE for the Test of Time award. I like to congratulate Durk who was my first (brilliant) student when moving back to the Netherlands and who is the main architect of the VAE. It was absolutely fantastic working with him.
Congratulations to @dpkingma and @wellingmax for winning the inaugural ICLR Test of Time Award for their amazing work on Auto-Encoding Variational Bayes, the paper that proposed Variational Autoencoders! arxiv.org/abs/1312.6114
💎 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model 🔥 Jupyter Notebook + @replicate 🥳 Thanks to @coolboywzy ❤ Yikai Wang ❤ Yifei Chen ❤ Chendong Xiang ❤ @an_epsilon0 ❤ Dajiang Yu ❤ @LiChongxuan ❤ Hang Su ❤ Jun Zhu ❤ 🌐page:…
Nice work. Welcome to use U-ViT
Happy to share our work, 📌U-ViT📌, just accepted to #CVPR2023. U-ViT is a transformer-based backbone for diffusion models. Paper: arxiv.org/abs/2209.12152 Code: github.com/baofff/U-ViT (1/4)