Junlong Li
@lockonlvange
Incoming PhD @hkust | CS MS/BS student @sjtu1896 | Interning @deepseek_ai | Ex Intern MSRA @MSFTResearch
Reasoning models like O1/R1 are powerful but ... waste sooooooo many tokens in overthinking even simple questions like "1+1"! If you are also troubled by this, don't miss LASER! It makes the answers much shorter and even more accurate, beating various baselines.
“What is the answer of 1 + 1?” Large Reasoning Models (LRMs) may generate 1500+ tokens just to answer this trivial question. Too much thinking 🤯 Can LRMs be both Faster AND Stronger? Yes. Introducing LASER💥: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping…
Excited to share DreamOn—our latest work teaching diffusion LMs to dynamically expand and contract beyond fixed-size canvases!
We present DreamOn: a simple yet effective method for variable-length generation in diffusion language models. Our approach boosts code infilling performance significantly and even catches up with oracle results.
🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code LLM to date.
👇this nice guy❤️will help us present CodeI/O (arxiv.org/abs/2502.07316) at Oral session 6A Applications in Agents and Coding, Thu 17 Jul 4 p.m. — 4:15 p.m. PDT. Take a look if you are there and feel interested.
Attending #ICML2025 🇨🇦 this week! Will be presenting Aguvis (arxiv.org/abs/2412.04454) on July 15 at 11am, and joining Computer Use Agent Workshop @workshopcua on July 19. If you’re into digital agent research, especially around computer/browser use, let’s grab a coffee!
Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
Glad to see @ChengZhoujun demystifing the data mixture recipe in massive domains in RLVR, great work, congrats!
🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly…
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective "We introduce GURU, a curated RL reasoning corpus of 92K verifiable examples spanning six reasoning domains—Math, Code, Science, Logic, Simulation, and Tabular—each built through domain-specific…
🧵Interesting paper—great to see the emphasis on large token counts, which is always appreciated. 😅But some of the results are... puzzling. For example, Table 3 essentially suggests that MegaMath is a non-math corpus. This is weird, especially given the care we've taken during…
[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!
We studied both rule-based and model-based verifiers and found that each has unique limitations. Rule-based verifiers are often unreliable, even in math, and are unavailable in many domains. Model-based verifiers can be easily hacked. In our paper, we construct simple…
🔍 Are Verifiers Trustworthy in RLVR? Our paper, Pitfalls of Rule- and Model-based Verifiers, exposes the critical flaws in reinforcement learning verification for mathematical reasoning. 🔑 Key findings: 1️⃣ Rule-based verifiers miss correct answers, especially when presented in…
🔍 Are Verifiers Trustworthy in RLVR? Our paper, Pitfalls of Rule- and Model-based Verifiers, exposes the critical flaws in reinforcement learning verification for mathematical reasoning. 🔑 Key findings: 1️⃣ Rule-based verifiers miss correct answers, especially when presented in…
Lack of RL Logical Reasoning data? Excited to share SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond 🚀 Building strong logical reasoning through RLVR 📄Paper: huggingface.co/papers/2505.19… 💻 Code: github.com/MiniMax-AI/Syn… (1/n)
I really like this paper. I'd like to echo the point that RL-related conclusions should be drawn cautiously when using only Qwen models solely on math tasks. Our SimpleRL-Zoo paper is one of the few that actually conducts RLVR across diverse models: arxiv.org/abs/2503.18892
🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…
Overall Introduction Exciting news! Enabling AI to truly "think" with images 🖼️🧠 is a huge challenge. We introduce OPENTHINKIMG, the first open-source E2E framework, powered by our proposed novel V-TOOLRL! Paper: arxiv.org/pdf/2505.08617 #AI #LVLM #RL #OpenSource
We've released the code for LegoGPT. This autoregressive model generates physically stable and buildable designs from text prompts, by integrating physics laws and assembly constraints into LLM training and inference. This work is led by PhD students @AvaLovelace0, @kangle_deng,…
Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N
Excited to share our work studying CLIP and LLaVA on chart understanding tasks 🔍 As the CLIP vision encoder serves as LLaVA's visual information source, have you considered how the CLIP ability affects LLaVA? 📄Paper: arxiv.org/abs/2503.18435 📷Code: github.com/hkust-nlp/Visi… (1/5)
🥁🥁 Happy to share our latest efforts on math pre-training data, the MegaMath dataset! This is a 9-month project starting from 2024’s summer, and we finally deliver: the largest math pre-training data to date containing 💥370B 💥tokens of web, code, and synthetic data!
Two months ago, we open-sourced the first R1-like zero RL training project on math with the Qwen2.5-math model. Since then, many great works performed successful zero RL training, mostly based on Qwen2.5 models. 🚀Now, we introduce SimpleRL-Zoo, a deep investigation of zero RL…
🚀Excited to introduce Predictive Data Selection (PreSelect): The Data That Predicts Is the Data That Teaches🚀 We find that data on which model losses are predictive of downstream abilities also contribute effectively to learning. Then we further propose predictive data…
🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k…