Rui Lu
@RayLu_THU
PhD student in @Tsinghua_Uni studying machine learning theory, graduate from Yao class. Also a youtuber @ 漫士沉思录 manshi_math
🧐Two papers, opposite opinions. Ours: High-entropy tokens drive all performance gains in LLM RL. Another: Don’t let low-prob (often high-entropy) tokens over-dominate. Both are valid. Why? 💡Model size matters. Larger LLMs support our view; smaller LLMs support theirs. 🧵⬇️
How does reasoning model actually reason? Our recent study shows that only 20% tokens with the high entropy play a critical role in deciding the reasoning trajectory! Check us out
🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance 🔗arxiv.org/abs/2506.01939 🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25) 🚀Insights: 1. RL retains base model entropy patterns 2. High-entropy tokens drive all RL improvement ⬇️
First time in my life, finally got the best paper in workshop direction>effort, indeed

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…
The only thing that can stop the progress of AGI... is overleaf before NeuRIPS deadline🙃

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/
attending #ICLR2025 at Singapore! welcome to our poster and chat at tomorrow morning, id 591.😃


🚨Reasoning model learning different abilities in RL! Understand our paper in 1⃣️ video. Also includes frequent Q&A. Check it out!
Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Our new paper investigate the question and has sparked active discussions. In video, freq Q&A starts at 1:28, covering common questions on pass@k, the takeaway and etc. see limit-of-RLVR.github.io
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? "we uncover that RL-trained models excel at low k (e.g., pass@1) but are consistently outperformed by base models at high k (e.g., pass@256)." "RLVR enhances sampling efficiency,…
Whether video generation model can really become world model and understand physical laws? We conduct systematic study in synthetic setting. Check out our paper!
Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises…
Still need finetuning to do safety alignment for LLM? Check out our new paper! We simply modify the LLM's parameters selected by a linear probe that can greatly reduce jail breaking behavior without hurting the performance! Details in arxiv link.
📢 New paper alert! 📢 🧠💉"Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing" 🚀 Modulate LLM behaviors through direct parameter editing. 🚀 Achieve up to 90% detoxification with inference-level computational cost! 💡🤖 arXiv: arxiv.org/abs/2407.08770
How to capitalize #GenerativeAI and #diffusion models for modeling complex data and structured optimization? From images to proteins? Check my talk "Diffusion models for Generative Optimization" at @broadinstitute , Harvard, MIT last week. Youtube: youtube.com/watch?v=hDRDx5…
Thank you for your appreciation! Reducing the communication cost is exactly what we want, since everybody needs to go through thousands of posters in two hours.
«Understanding, Predicting, and Better Resolving Q-Value Divergence in Offline-RL» is my favourite poster so far. Clearly stating the takeaway, telling me why, how, and what it predicts and resolves. Communication overhead reduced to the minimum. Good job. 🥳🥳🥳
Check us out at #NeurIPS2023 poster!We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…
