Rui Lu

@RayLu_THU

PhD student in @Tsinghua_Uni studying machine learning theory, graduate from Yao class. Also a youtuber @ 漫士沉思录 manshi_math

Joined October 2022

135Following

337Followers

Pinned

Rui Lu Retweeted

Shenzhi Wang🌟@ShenzhiWang_THU · Jun 5

🧐Two papers, opposite opinions. Ours: High-entropy tokens drive all performance gains in LLM RL. Another: Don’t let low-prob (often high-entropy) tokens over-dominate. Both are valid. Why? 💡Model size matters. Larger LLMs support our view; smaller LLMs support theirs. 🧵⬇️

515

511

38.0K

Pinned

Rui Lu@RayLu_THU · Jun 3

How does reasoning model actually reason？ Our recent study shows that only 20% tokens with the high entropy play a critical role in deciding the reasoning trajectory! Check us out

SShenzhi Wang🌟@ShenzhiWang_THU · Jun 3

🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance 🔗arxiv.org/abs/2506.01939 🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25) 🚀Insights: 1. RL retains base model entropy patterns 2. High-entropy tokens drive all RL improvement ⬇️

2.0K

Rui Lu@RayLu_THU · Jul 13

First time in my life, finally got the best paper in workshop direction>effort, indeed

RayLu_THU's tweet image. First time in my life, finally got the best paper
in workshop

direction&gt;effort, indeed

585

167

32.0K

Rui Lu Retweeted

Jiahao Qiu@JiahaoQiu99 · May 27

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI…

23.0K

Rui Lu@RayLu_THU · May 14

The only thing that can stop the progress of AGI... is overleaf before NeuRIPS deadline🙃

875

Rui Lu Retweeted

❄

❄️Andrew Zhao❄️@_AndrewZhao · May 7

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

341

2.0K

1.0K

498.0K

Rui Lu@RayLu_THU · Apr 23

attending #ICLR2025 at Singapore! welcome to our poster and chat at tomorrow morning, id 591.😃

167

Rui Lu@RayLu_THU · Apr 22

🚨Reasoning model learning different abilities in RL! Understand our paper in 1⃣️ video. Also includes frequent Q&A. Check it out!

YYang Yue@YangYue_THU · Apr 22

Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Our new paper investigate the question and has sparked active discussions. In video, freq Q&A starts at 1:28, covering common questions on pass@k, the takeaway and etc. see limit-of-RLVR.github.io

242

Rui Lu Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Apr 21

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? "we uncover that RL-trained models excel at low k (e.g., pass@1) but are consistently outperformed by base models at high k (e.g., pass@256)." "RLVR enhances sampling efficiency,…

101

657

553

85.0K

Rui Lu@RayLu_THU · Nov 5

Whether video generation model can really become world model and understand physical laws? We conduct systematic study in synthetic setting. Check out our paper!

BBingyi Kang@bingyikang · Nov 5

Curious whether video generation models (like #SORA) qualify as world models? We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws. Three are three key messages to take home: 1⃣The model generalises…

734

Rui Lu@RayLu_THU · Jul 15, 2024

Still need finetuning to do safety alignment for LLM? Check out our new paper! We simply modify the LLM's parameters selected by a linear probe that can greatly reduce jail breaking behavior without hurting the performance! Details in arxiv link.

LLucy@ALucyBrilliant · Jul 15, 2024

📢 New paper alert! 📢 🧠💉"Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing" 🚀 Modulate LLM behaviors through direct parameter editing. 🚀 Achieve up to 90% detoxification with inference-level computational cost! 💡🤖 arXiv: arxiv.org/abs/2407.08770

475

Rui Lu Retweeted

Mengdi Wang@MengdiWang10 · Mar 14, 2024

How to capitalize #GenerativeAI and #diffusion models for modeling complex data and structured optimization? From images to proteins? Check my talk "Diffusion models for Generative Optimization" at @broadinstitute , Harvard, MIT last week. Youtube: youtube.com/watch?v=hDRDx5…

250

149

26.0K

Rui Lu@RayLu_THU · Feb 3, 2024

Good wishes for #SpringFestival

189

Rui Lu@RayLu_THU · Dec 13, 2023

Thank you for your appreciation! Reducing the communication cost is exactly what we want, since everybody needs to go through thousands of posters in two hours.

AAlfredo Canziani@alfcnz · Dec 13, 2023

«Understanding, Predicting, and Better Resolving Q-Value Divergence in Offline-RL» is my favourite poster so far. Clearly stating the takeaway, telling me why, how, and what it predicts and resolves. Communication overhead reduced to the minimum. Good job. 🥳🥳🥳

729

Rui Lu@RayLu_THU · Dec 13, 2023

Check us out at #NeurIPS2023 poster！We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…

RayLu_THU's tweet image. Check us out at #NeurIPS2023 poster！We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…

518