Zhoujun (Jorge) Cheng
@ChengZhoujun
CS Ph.D. @UCSanDiego | Prev. @XLangNLP @MSFTResearch @sjtu1896
🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly…

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…
Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉
Countless times of iterations for cooking it, but the process is satisfying. I still believe we can pour more data in each stage if we have more hands so the potential is unlimited and scaling law hasn’t hit the wall yet! Towards Digital Agents🤖 We are already on the way.
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Excited to bring Qwen3-Coder into the browser and terminal world! Building the scaffolding and environments for this big guy to play and learn is tough but incredibly "rewarding". Agentic coding and browsing are arguably the two most important skills for digital agents: they…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
🥰
>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)
🔥 LLMs can fix bugs, but can they make your code faster? We put them to the test on real-world repositories, and the results are in! 🚀 New paper: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?" Key findings: 1️⃣ We introduce SWE-Perf, the…
🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵
If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.
🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code LLM to date.
👇this nice guy❤️will help us present CodeI/O (arxiv.org/abs/2502.07316) at Oral session 6A Applications in Agents and Coding, Thu 17 Jul 4 p.m. — 4:15 p.m. PDT. Take a look if you are there and feel interested.
Attending #ICML2025 🇨🇦 this week! Will be presenting Aguvis (arxiv.org/abs/2412.04454) on July 15 at 11am, and joining Computer Use Agent Workshop @workshopcua on July 19. If you’re into digital agent research, especially around computer/browser use, let’s grab a coffee!
I still find it mysterious whether and how intelligence and capabilities transfer between domains and skills - from meta learning during early days to more recent question like whether solving maths helps writing a good essay. Sometime I feel a bit pessimistic given not enough…
Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
If you're attending #ICML2025, check out our 💭 Agent Workflow Memory for online adaptive agents: Jul 17 4:30-7pm @ West Hall 🔎 RAGGED for designing scalable and stable RAG systems: Jul 16 11:00-13:30 @ East Hall Computer Use Agent Workshop on Jul 19 🌐 "Universal Retrieval for…
Our Coconut work is now accepted in COLM'25. Thanks all the reviewers for the support and constructive feedbacks!
Our Coconut work (learning continuous latent CoT) has opened sourced now. Welcome to play with it: github.com/facebookresear…
🚀 Check out our recent work Afterburner: Reinforcement Learning demonstrating super powerful self-improving code efficiency optimization! 💻✨
🚀 Thrilled to announce our new paper: Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization Stop settling for LLM-generated code that just works. Demand code that performs! Our new RL framework boosts Pass@1 +15% and significantly…
🤔 Ever wonder where reinforcement learning actually boosts (or hurts) LLM’s reasoning capabilities? Meet SPARKLE—a new analysis framework that dissects gains from RL in planning, knowledge integration, and subproblem solving. 📄 Paper: arxiv.org/abs/2506.04723 🌐 Project:…
ASI is now accepted to @COLM_conf #COLM2025! 🍁 🔗 arxiv.org/abs/2504.06821
Meet ASI: Agent Skill Induction A framework for online programmatic skill learning — no offline data, no training. 🧠 Build reusable skills during test 📈 +23.5% success, +15.3% efficiency 🌐 Scales to long-horizon tasks, transfers across websites Let's dive in! 🧵