Paul Gauthier

@paulgauthier

Entrepreneur, investor, advisor

Southern California

Joined April 2009

118Following

9KFollowers

Pinned

Paul Gauthier@paulgauthier · Jun 27

Aider v0.85.0 is out. - Support for Responses API models like o3-pro and o1-pro. - New Gemini 2.5 Pro models. - Updated costs for o3. - Repo-map & linting support for Clojure and MATLAB. - Aider wrote 21% of the code in this release. Full release notes: aider.chat/HISTORY.html

185

9.0K

Paul Gauthier@paulgauthier · Jul 18

Kimi K2 scored 59% on the aider polyglot coding benchmark. Full leaderboard: aider.chat/docs/leaderboa…

345

160.0K

Paul Gauthier@paulgauthier · Jul 11

Grok 4 scored 80% on the aider polyglot coding benchmark, with high reasoning effort. This puts Grok in 4th place on the leaderboard. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. Grok 4 scored 80% on the aider polyglot coding benchmark, with high reasoning effort. This puts Grok in 4th place on the leaderboard.

Full leaderboard:
aider.chat/docs/leaderboa…

608

87.0K

Paul Gauthier@paulgauthier · Jun 30

OpenAI's o3-pro set a new SOTA of 85% on the aider polyglot coding benchmark, running with "high" reasoning effort. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. OpenAI's o3-pro set a new SOTA of 85% on the aider polyglot coding benchmark, running with "high" reasoning effort.

Full leaderboard:
aider.chat/docs/leaderboa…

560

45.0K

Paul Gauthier@paulgauthier · Jun 9

DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark. This is a significant increase over the prior release of R1. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark. This is a significant increase over the prior release of R1.

Full leaderboard:
aider.chat/docs/leaderboa…

683

141

136.0K

Paul Gauthier@paulgauthier · Jun 9

Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens. The default thinking mode, where Gemini self-determines the thinking budget, scored 79%. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens.

The default thinking mode, where Gemini self-determines the thinking budget, scored 79%.

Full leaderboard:
aider.chat/docs/leaderboa…

667

41.0K

Paul Gauthier@paulgauthier · May 30

Aider v0.84.0 is out with support for Claude 4 Opus and Sonnet and Gemini 2.5 Flash Preview 05-20. Aider wrote 79% of the code in this release. Full release notes: aider.chat/HISTORY.html

200

12.0K

Paul Gauthier@paulgauthier · May 26

Gemini 2.5 Flash 05-20 with 23k thinking tokens scored 55% on the aider polyglot coding benchmark. Without thinking, it scored 44%. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. Gemini 2.5 Flash 05-20 with 23k thinking tokens scored 55% on the aider polyglot coding benchmark. Without thinking, it scored 44%.

Full leaderboard:
aider.chat/docs/leaderboa…

258

19.0K

Paul Gauthier@paulgauthier · May 25

Claude 4 Opus scored 72% on the aider polyglot coding benchmark. Claude 4 Sonnet scored 61%. Both of those are with 32k think tokens. Sonnet 4 seems to have underperformed 3.7. Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. Claude 4 Opus scored 72% on the aider polyglot coding benchmark. Claude 4 Sonnet scored 61%. Both of those are with 32k think tokens. Sonnet 4 seems to have underperformed 3.7.

Full leaderboard:
aider.chat/docs/leaderboa…

640

138

216.0K

Paul Gauthier@paulgauthier · May 11

Aider just passed 1000000000000000 GitHub Stars! That's 2^15 or 32,768 stars in decimal. github.com/Aider-AI/aider

240

10.0K

Paul Gauthier@paulgauthier · May 10

I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format. The leaderboard and Qwen3 article have both been updated. aider.chat/docs/leaderboa… aider.chat/2025/05/08/qwe…

paulgauthier's tweet image. I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format.

The leaderboard and Qwen3 article have both been updated.

aider.chat/docs/leaderboa…
aider.chat/2025/05/08/qwe…

177

12.0K

Paul Gauthier@paulgauthier · May 9

Aider v0.83.0 is out with support for Qwen3, Gemini 2.5 Pro Preview 05-06. A huge number of QOL features, many from contributors. Thanks! Aider wrote 55% of the code in this release. Full release notes: aider.chat/HISTORY.html

169

9.0K

Paul Gauthier@paulgauthier · May 9

Gemini Pro is quite good at unified diffs. Not good enough to apply literally with patch, but aider has a very flexible udiff backend. I mostly use Gemini like: aider --model gemini --edit-format udiff-simple Benchmarks a bit worse, so I'm reluctant to make it default.

163

9.0K

Paul Gauthier@paulgauthier · May 8

Gemini 2.5 Pro Preview 05-06 scored 77% on the leaderboard, coming in 2nd place close behind o3 (high). Full leaderboard: aider.chat/docs/leaderboa…

paulgauthier's tweet image. Gemini 2.5 Pro Preview 05-06 scored 77% on the leaderboard, coming in 2nd place close behind o3 (high).

Full leaderboard:
aider.chat/docs/leaderboa…

341

28.0K

Paul Gauthier@paulgauthier · May 8

The $6.32 benchmark cost for Gemini 2.5 Pro Preview 03-25 was incorrect. The true cost was higher, possibly significantly so. Unfortunately 03-25 is no longer available to re-run. The new 05-06 version costs $37 to run the benchmark. Root cause analysis: aider.chat/2025/05/07/gem…

472

51.0K