Chongxuan Li

@LiChongxuan

Associate Professor at RUC. Deep Generative Models; Machine Learning. Analytic-DPM, DPM-Solver, ProlificDreamer, U-ViT, and LLaDA.

Joined February 2016

98Following

232Followers

Pinned

Chongxuan Li@LiChongxuan · Feb 17

🚀【Large Language Diffusion Models】#DiffusionModels #LLM #LLaDA We built LLaDA-8B—the FIRST non-autoregressive model rivaling LLaMA3! CRUSHES Llama2-7B on ~20 tasks while unlocking ICL/instruction-following/multi-turn chat

LiChongxuan's tweet image. 🚀【Large Language Diffusion Models】#DiffusionModels #LLM #LLaDA
We built LLaDA-8B—the FIRST non-autoregressive model rivaling LLaMA3! CRUSHES Llama2-7B on ~20 tasks while unlocking ICL/instruction-following/multi-turn chat

10.0K

Pinned

Chongxuan Li Retweeted

apolinario 🌐@multimodalart · Feb 26

LLaDA (the first Large Language Diffusion Model) is *just* out 💥 and I've built a demo, try out now 👨‍💻 It's mesmerizing to watch the diffusion process 🌀, and it being a diffusion model gives you superpowers like "the 4th word has to be pineapple" 🦸 Demo and weights 👇

531

409

81.0K

Chongxuan Li@LiChongxuan · May 29

nice work！

ZZichen Liu@zzlccc · May 28

Please check out our paper and code for more details: Paper: huggingface.co/papers/2505.21… Code: github.com/sail-sg/VeriFr… Joint work with amazing collaborators @NickZhou523786, @anyaasims, @Haonan_Wang_ , @TianyuPang1 , @LiChongxuan , Liang Wang, @mavenlin , @duchao0726 !

223

Chongxuan Li@LiChongxuan · May 28

Thanks for sharing!

TTanishq Abraham is at ICML@iScienceLuvr · May 27

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models "We demonstrate the effectiveness of VRPO by applying it to LLaDA, and the resulting model, LLaDA 1.5, outperforms its SFT-only predecessor consistently and significantly across mathematical…

391

Chongxuan Li Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · May 22

Scaling Diffusion Transformers Efficiently via μP "In this work, we generalize standard μP to diffusion Transformers and validate its effectiveness through large-scale experiments. " "In both cases, models under μP outperform their respective baselines while requiring small…

135

10.0K

Chongxuan Li@LiChongxuan · May 24

thanks for sharing our work!

AAK@_akhaliq · May 23

LLaDA-V Large Language Diffusion Models with Visual Instruction Tuning

207

Chongxuan Li@LiChongxuan · Mar 21

🚀 Understanding R1-Zero-Like Training 🪂 Wait… 🤯 “Aha moment” already exists in DeepSeek-V3-Base before RL-tuning? 📏 RL-tuned output length keeps growing — GRPO bias?? 🥇 Fix the bias → 7B AIME SOTA. 📜More in our new paper: github.com/sail-sg/unders…

ZZichen Liu@zzlccc · Mar 21

🪂Understanding R1-Zero-Like Training: A Critical Perspective * DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning?? * The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO?? * Getting GRPO Done Right, we achieve a 7B AIME sota! 🧵 📜Full…

8.0K

Chongxuan Li@LiChongxuan · Mar 19

Nice work by Luxi and Zihan!

LLuxi Chen@Luxi_Chen123 · Mar 19

We are excited to introduce FlexWorld, a framework capable of generating 3D scenes from a single image that supports flexible viewpoint navigation, including 360° rotation and zooming. Code and model weights are open-source—try it out! Project Page：ml-gsai.github.io/FlexWorld/

243

Chongxuan Li Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Mar 11

Effective and Efficient Masked Image Generation Models "We propose a unified framework integrating masked image modeling and masked diffusion models" "on ImageNet 256×256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal…

8.0K

Chongxuan Li@LiChongxuan · Feb 27

We have released the code and models for LLaDA at github.com/ML-GSAI/LLaDA Thanks Shen for providing a very detailed tutorial on how to train your own llada and FAQs.

LiChongxuan's tweet card. Official PyTorch implementation for "Large Language Diffusion Models" - ML-GSAI/LLaDA

4.0K

Chongxuan Li Retweeted

AK@_akhaliq · Feb 25

RIFLEx A Free Lunch for Length Extrapolation in Video Diffusion Transformers TL;DR: Effortlessly extend your video with just one line of code: freq[k-1]=(2*np.pi)/(L*s).

173

18.0K

Chongxuan Li@LiChongxuan · Feb 6

😯There may not be 𝗔𝗵𝗮 𝗠𝗼𝗺𝗲𝗻𝘁 in R1-Zero-like training! We observe (superficial) self-reflection patterns in base models and investigate the RL mechanism in R1-Zero-like training. 📚Read more: oatllm.notion.site/oat-zero ⚒️Code: github.com/sail-sg/oat-ze… More results will…

ZZichen Liu@zzlccc · Feb 6

🚨There May Not be Aha Moment in R1-Zero-like Training: oatllm.notion.site/oat-zero A common belief about the recent R1-Zero-like training is that self-reflections *emerge* as a result of RL training. We carefully investigated and showed the opposite. 🧵

183

144

24.0K

Chongxuan Li@LiChongxuan · May 7, 2024

Thank you Yisong and the Award Committee for choosing the VAE for the Test of Time award. I like to congratulate Durk who was my first (brilliant) student when moving back to the Netherlands and who is the main architect of the VAE. It was absolutely fantastic working with him.

YYisong Yue@yisongyue · May 7, 2024

Congratulations to @dpkingma and @wellingmax for winning the inaugural ICLR Test of Time Award for their amazing work on Auto-Encoding Variational Bayes, the paper that proposed Variational Autoencoders! arxiv.org/abs/1312.6114

550

108.0K

Chongxuan Li Retweeted

camenduru@camenduru · Mar 12, 2024

💎 CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model 🔥 Jupyter Notebook + @replicate 🥳 Thanks to @coolboywzy ❤ Yikai Wang ❤ Yifei Chen ❤ Chendong Xiang ❤ @an_epsilon0 ❤ Dajiang Yu ❤ @LiChongxuan ❤ Hang Su ❤ Jun Zhu ❤ 🌐page:…

109

8.0K

Chongxuan Li@LiChongxuan · Feb 28, 2023

Nice work. Welcome to use U-ViT

FFan Bao@fanbaoTHU · Feb 28, 2023

Happy to share our work, 📌U-ViT📌, just accepted to #CVPR2023. U-ViT is a transformer-based backbone for diffusion models. Paper: arxiv.org/abs/2209.12152 Code: github.com/baofff/U-ViT (1/4)

204