Michael Luo

@michaelzluo

Project Lead @Agentica_ | Prev. Researcher @GoogleDeepMind | PhD at UC Berkeley @berkeley_ai

Berkeley, CA

Joined June 2023

209Following

510Followers

Pinned

Michael Luo@michaelzluo · Jul 2

🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and training DeepSWE—a SOTA software engineering agent trained end-to-end with @deepseek_ai -like RL on Qwen32B, hitting 59% on SWE-Bench-Verified and topping the…

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

10.0K

Michael Luo@michaelzluo · 3 h

🔮 The future is AGENTS for all applications. In the first 6 months we perfected RL for verifiable‑reward reasoning—single step chain‑of‑thought, deterministic answers. Now, the next years belong to multi‑agent systems—multiple steps (does not need thought), multiple agents…

104

Michael Luo@michaelzluo · Jul 19

We've noticed that quite a lot of sources claim credit from one-off pipelining, which originated from our work DeepCoder. Not only SemiAnalysis @dylan522p but also bigger companies such as Meta's LLAMA RL paper (see Figure 2), that refuse to cite us to claim credit.

HHot Aisle@HotAisle · Jun 11

Unreal. 🤯 Someone just pointed out to me privately yet another case of plagiarism by @dylan522p. This time from a Together.AI blog post from April. Once again, they’ve recreated an image and stamped their name on it, just like the last one they claimed was merely…

10.0K

Michael Luo Retweeted

Manish Shetty@slimshetty_ · May 30

✨ NEW SWE-Agents BENCHMARK ✨ Introducing GSO: The Global Software Optimization Benchmark - 👩🏻‍💻 100+ challenging software optimization tasks - 🛣️ a long-horizon task w/ precise specification - 🐘 large code changes in Py, C, C++, ... - 📉 SOTA models get < 5% success! 1/

127

28.0K

Michael Luo@michaelzluo · Jul 3

It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? DeepSWE generates N candidate solutions. Then, another LLM…

CCasper Hansen@casper_hansen_ · Jul 3

Is it malpractice to report SOTA with pass@8 without using other models at pass@8 or just standard practice at this point? It's clearly not SOTA if it's behind Devstral in a pass@1

23.0K

Michael Luo Retweeted

Fahim Tajwar@FahimTajwar10 · May 28

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

143

837

865

82.0K

Michael Luo Retweeted

Ahmad Beirami@abeirami · May 27

As we go through a lot of excitement about RL recently with lots of cool work/results, here is a reminder that RL with a reverse KL-regularizer to the base model cannot learn new skills that were not already present in the base model. It can only amplify the existing weak skills.

474

470

66.0K

Michael Luo Retweeted

Brandon Trabucco@brandontrabucco · Apr 23

🌏 Building web-scale agents, and tired of Math and Coding tasks? Come chat with us at ICLR in Singapore. We are presenting InSTA at the DATA-FM workshop in the second Oral session, April 28th 2:30pm. InSTA is the largest environment for training agents, spanning 150k live…

4.0K

Michael Luo Retweeted

Xeophon@xeophon_ · Apr 18

the vLLM vs SGLang beef is the weirdest (and saddest) thing ever both are under the Linux foundation, could join forces and make the best inference framework ever :/

112

10.0K

Michael Luo Retweeted

Yifei Zhou@YifeiZhou02 · Mar 24

📢LLM and RL folks! 📢 No good RL algorithm for credit assignment for multi-turn LLM agents on reasoning-heavy tasks? Do not even have a good benchmark for studying it? In SWEET-RL, we give you both (a vibe coding benchmark and SWEET algorithm). A thread 🧵(1/n)

380

271

38.0K

Michael Luo Retweeted

Prime Intellect@PrimeIntellect · Apr 15

Today we’re launching INTELLECT-2: The first decentralized 32B-parameter RL training run open to join for anyone with compute — fully permissionless. Scaling towards frontier reasoning across coding, math and science.

293

2.0K

709

347.0K

Michael Luo Retweeted

Agentica Project@Agentica_ · Apr 16

We're trending on @huggingface models today! 🔥 Huge thanks to our amazing community for your support. 🙏

3.0K

Michael Luo Retweeted

AmbiRobotics@AmbiRobotics · Feb 28

This week @encord_team hosted AI After Hours at @github HQ and our Foundation Model Lead, Vishal Satish, shared how Ambi Robotics is leveraging 200K+ hours of high-fidelity production data to train PRIME-1—a domain-expert foundation model designed for industrial reliability.

498

Michael Luo Retweeted

Alex Gurung@AlexAag1234 · Apr 3

Preprint: Can we learn to reason for story generation (~100k tokens), without reward models? Yes! We introduce an RLVR-inspired reward paradigm VR-CLI that correlates with human judgements of quality on the 'novel' task of Next-Chapter Prediction. Paper: arxiv.org/abs/2503.22828

325

261

38.0K

Michael Luo Retweeted

Naman Jain@StringChaos · Apr 9

Excited to release R2E-Gym - 🔥 8.1K executable environments using synthetic data - 🧠 Hybrid verifiers for enhanced inference-time scaling - 📈 51% success-rate on the SWE-Bench Verified - 🤗 Open Source Data + Models + Trajectories 1/

258

115

44.0K

Michael Luo@michaelzluo · Apr 8

🚀 We introduce DeepCoder-14B-Preview, a fully open-sourced coding model that is on par with o3-mini and o1! 📷 We scaled our model with RL magic up to 32K context. It's performance scales to 64K context 🔥

AAgentica Project@Agentica_ · Apr 8

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

114

11.0K

Michael Luo Retweeted

Karan Dalal@karansdalal · Jul 8, 2024

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses…

286

2.0K

424.0K

Michael Luo Retweeted

AK@_akhaliq · Apr 4

Deepseek just announced Inference-Time Scaling for Generalist Reward Modeling on Hugging Face show that SPCT significantly improves the quality and scalability of GRMs, outperforming existing methods and models in various RM benchmarks without severe biases, and could achieve…

543

251

49.0K

Michael Luo@michaelzluo · Feb 26

Prompt-to-Leaderboard is now LIVE❤️‍🔥 Input any prompt → leaderboard for you in real-time. Huge shoutout to the incredible team that made this happen! @evan_a_frick @connorzchen @joseph_ten4849 @LiTianleli @infwinston @ml_angelopoulos @istoica05

llmarena.ai@lmarena_ai · Feb 26

Introducing Prompt-to-leaderboard (P2L): a real-time LLM leaderboard tailored exactly to your use case! P2L trains an LLM to generate "prompt-specific" leaderboards, so you can input a prompt and get a leaderboard specifically for that prompt. The model is trained on the 2M…

8.0K