Zhoujun (Jorge) Cheng (@ChengZhoujun)

Pinned

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jun 20

🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly…

ChengZhoujun's tweet image. 🤯What we know about RL for reasoning might not hold outside math and code?

We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly…

4

57

245

183

42.0K

Zhoujun (Jorge) Cheng Retweeted

C

Chujie Zheng@ChujieZheng · Jul 25

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

22

205

1.0K

170.0K

Zhoujun (Jorge) Cheng Retweeted

Q

Qian Liu@sivil_taram · Jul 23

Wrapped up a SWE-Perf website redesign using Qwen3-Coder on AnyCoder (huggingface.co/spaces/akhaliq…). The process was incredibly fast and great! One question for Qwen devs, though: did you pretrain a secret love for the color purple into the coder's persona? 😉

1

14

83

21

24.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 23

Countless times of iterations for cooking it, but the process is satisfying. I still believe we can pour more data in each stage if we have more hands so the potential is unlimited and scaling law hasn’t hit the wall yet! Towards Digital Agents🤖 We are already on the way.

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

1

6

50

1

2.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 22

Excited to bring Qwen3-Coder into the browser and terminal world! Building the scaffolding and environments for this big guy to play and learn is tough but incredibly "rewarding". Agentic coding and browsing are arguably the two most important skills for digital agents: they…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

2

12

106

11

8.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 23

Apart from the performance, it’s pure entertainment just watching Qwen3‑Coder build Qwen Code all by itself. Agentic coding is really something: it explores, understands, plans, and acts seamlessly. Honored to be “in the game”—even if my entire work so far is smashing the Enter…

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

2

10

44

7

4.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 22

🥰

QQwen@Alibaba_Qwen · Jul 22

>>> Qwen3-Coder is here! ✅ We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…

0

1

0

198

Zhoujun (Jorge) Cheng Retweeted

L

Liliang Ren@liliang_ren · Jul 18

We’re open-sourcing the pre-training code for Phi4-mini-Flash, our SoTA hybrid model that delivers 10× faster reasoning than Transformers — along with μP++, a suite of simple yet powerful scaling laws for stable large-scale training. 🔗 github.com/microsoft/Arch… (1/4)

13

216

1.0K

136.0K

Zhoujun (Jorge) Cheng Retweeted

Q

Qian Liu@sivil_taram · Jul 17

🔥 LLMs can fix bugs, but can they make your code faster? We put them to the test on real-world repositories, and the results are in! 🚀 New paper: "SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?" Key findings: 1️⃣ We introduce SWE-Perf, the…

1

17

62

28

6.0K

Zhoujun (Jorge) Cheng Retweeted

Y

Yijia Shao@EchoShao8899 · Jun 12

🚨 70 million US workers are about to face their biggest workplace transmission due to AI agents. But nobody asks them what they want. While AI races to automate everything, we took a different approach: auditing what workers want vs. what AI can do across the US workforce.🧵

12

132

667

722

105.0K

Zhoujun (Jorge) Cheng Retweeted

A

Aviral Kumar@aviral_kumar2 · Jul 15

If you are at #icml25 and are interested in RL algorithms, scaling laws for RL, and test-time scaling (& related stuff), come talk to us at various poster sessions (details ⬇️). We are also presenting some things at workshops later in the week, more on that later.

1

10

151

47

6.0K

Zhoujun (Jorge) Cheng Retweeted

Z

Zhihui Xie@_zhihuixie · Jul 15

🚀 Thrilled to announce Dream-Coder 7B — the most powerful open diffusion code  LLM to date.

2

34

111

42

11.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 15

👇this nice guy❤️will help us present CodeI/O (arxiv.org/abs/2502.07316) at Oral session 6A Applications in Agents and Coding, Thu 17 Jul 4 p.m. — 4:15 p.m. PDT. Take a look if you are there and feel interested.

YYiheng Xu@yihengxu_ · Jul 14

Attending #ICML2025 🇨🇦 this week! Will be presenting Aguvis (arxiv.org/abs/2412.04454) on July 15 at 11am, and joining Computer Use Agent Workshop @workshopcua on July 19. If you’re into digital agent research, especially around computer/browser use, let’s grab a coffee!

0

3

24

1

1.0K

Zhoujun (Jorge) Cheng Retweeted

L

Lilian Weng@lilianweng · Jul 13

I still find it mysterious whether and how intelligence and capabilities transfer between domains and skills - from meta learning during early days to more recent question like whether solving maths helps writing a good essay. Sometime I feel a bit pessimistic given not enough…

56

43

756

478

91.0K

Zhoujun (Jorge) Cheng Retweeted

A

Ari Holtzman@universeinanegg · Jul 9

Prompting is our most successful tool for exploring LLMs, but the term evokes eye-rolls and grimaces from scientists. Why? Because prompting as scientific inquiry has become conflated with prompt engineering. This is holding us back. 🧵and new paper: arxiv.org/abs/2507.00163

6

30

161

113

13.0K

Zhoujun (Jorge) Cheng Retweeted

K

Kimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

280

1.0K

7.0K

3.0K

2.5M

Zhoujun (Jorge) Cheng Retweeted

Z

Zora Wang@ZhiruoW · Jul 10

If you're attending #ICML2025, check out our 💭 Agent Workflow Memory for online adaptive agents: Jul 17 4:30-7pm @ West Hall 🔎 RAGGED for designing scalable and stable RAG systems: Jul 16 11:00-13:30 @ East Hall Computer Use Agent Workshop on Jul 19 🌐 "Universal Retrieval for…

2

6

56

15

5.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 9

Our Coconut work is now accepted in COLM'25. Thanks all the reviewers for the support and constructive feedbacks!

YYuandong Tian@tydsh · Jan 15

Our Coconut work (learning continuous latent CoT) has opened sourced now. Welcome to play with it: github.com/facebookresear…

2

4

100

21

11.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 9

🚀 Check out our recent work Afterburner: Reinforcement Learning demonstrating super powerful self-improving code efficiency optimization! 💻✨

DDu Mingzhe@spirit__song · Jul 9

🚀 Thrilled to announce our new paper: Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization Stop settling for LLM-generated code that just works. Demand code that performs! Our new RL framework boosts Pass@1 +15% and significantly…

0

4

27

4

1.0K

Zhoujun (Jorge) Cheng Retweeted

C

Caiming Xiong@CaimingXiong · Jul 8

🤔 Ever wonder where reinforcement learning actually boosts (or hurts) LLM’s reasoning capabilities? Meet SPARKLE—a new analysis framework that dissects gains from RL in planning, knowledge integration, and subproblem solving. 📄 Paper: arxiv.org/abs/2506.04723 🌐 Project:…

7

56

221

166

18.0K

Z

Zhoujun (Jorge) Cheng@ChengZhoujun · Jul 8

ASI is now accepted to @COLM_conf #COLM2025! 🍁 🔗 arxiv.org/abs/2504.06821

ZZora Wang@ZhiruoW · Apr 15

Meet ASI: Agent Skill Induction A framework for online programmatic skill learning — no offline data, no training. 🧠 Build reusable skills during test 📈 +23.5% success, +15.3% efficiency 🌐 Scales to long-horizon tasks, transfers across websites Let's dive in! 🧵

1

7

90

17

7.0K