Agentica Project (@Agentica_)

Pinned

A

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

Agentica_'s tweet image. 🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

💪DeepSWE…

15

67

348

222

59.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

Excited to introduce DeepSWE-Preview, our latest model trained in collaboration with @Agentica_ Using only RL, we increase the performance of Qwen 3 32B from 23% to 42.2% on SWE-Bench Verified!

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

1

4

38

10

3.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

Let's give a big round of applause for an amazing open-source release! They're not just sharing the model's weights; they're open-sourcing everything: the model itself, the training code (rLLM), the dataset (R2EGym), and the training recipe for complete reproducibility. 👏👏👏👏

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

1

2

16

1

1.0K

Pinned

A

Agentica Project@Agentica_ · Jul 3

🚀 Introducing rLLM: a flexible framework for post-training language agents via RL. It's also the engine behind DeepSWE, a fully open-sourced, state-of-the-art coding agent. 🔗 GitHub: github.com/agentica-proje… 📘 rLLM: pretty-radio-b75.notion.site/rLLM-A-Framewo… 📘 DeepSWE: pretty-radio-b75.notion.site/DeepSWE-Traini…

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

0

4

23

15

2.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

The first half of 2025 is all about reasoning models. The second half? It’s about agents. At Agentica, we’re thrilled to launch two major releases: 1. DeepSWE, our STOA coding agent trained with RL that tops SWEBench leaderboard for open-weight models. 2. rLLM, our agent…

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

3

9

56

5

5.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

We believe in experience-driven learning in the SKY lab. Hybrid verification plays an important role.

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

0

4

20

3

2.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE: Open-Source SWE Agent We're excited to release DeepSWE, our fully open-source software engineering agent trained with pure reinforcement learning on Qwen3-32B. 📊 The results: 59% on SWE-Bench-Verified with test-time scaling (42.2% Pass@1) - new SOTA…

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

3

12

77

22

4.0K

Pinned

A

Agentica Project@Agentica_ · Jul 2

🚀The era of overpriced, black-box coding assistants is OVER. Thrilled to lead the @Agentica_ team in open-sourcing and training DeepSWE—a SOTA software engineering agent trained end-to-end with @deepseek_ai -like RL on Qwen32B, hitting 59% on SWE-Bench-Verified and topping the…

AAgentica Project@Agentica_ · Jul 2

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

11

13

96

39

10.0K

Pinned

Agentica Project Retweeted

T

Together AI@togethercompute · Jul 2

Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. Built in…

9

79

501

329

260.0K

A

Agentica Project@Agentica_ · Jul 3

It's easy to confuse Best@K vs Pass@K—and we've seen some misconceptions about our results. Our 59% on SWEBench-Verified is Pass@1 with Best@16, not Pass@8/16. Our Pass@8/16 is 67%/71%. So how did we achieve this? DeepSWE generates N candidate solutions. Then, another LLM…

CCasper Hansen@casper_hansen_ · Jul 3

Is it malpractice to report SOTA with pass@8 without using other models at pass@8 or just standard practice at this point? It's clearly not SOTA if it's behind Devstral in a pass@1

1

15

52

30

23.0K

A

Agentica Project@Agentica_ · Apr 16

We're trending on @huggingface models today! 🔥 Huge thanks to our amazing community for your support. 🙏

2

6

47

5

3.0K

Agentica Project Retweeted

Y

Yuchen Jin@Yuchenj_UW · Apr 9

UC Berkeley open-sourced a 14B model that rivals OpenAI o3-mini and o1 on coding! They applied RL to Deepseek-R1-Distilled-Qwen-14B on 24K coding problems. It only costs 32 H100 for 2.5 weeks (~$26,880)! It's truly open-source. They released everything: the model, training…

53

398

3.0K

274.0K

A

Agentica Project@Agentica_ · Apr 9

Our team has open sourced our reasoning model that reaches o1 and o3-mini level on coding and math: DeepCoder-14B-Preview.

AAgentica Project@Agentica_ · Apr 8

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

0

2

31

4

4.0K