Ankesh Anand
@ankesh_anand
Research scientist @googledeepmind (Gemini Thinking & Post-Training), prev phd @milamontreal. RL for Gemini 2.5 and Project Mariner. Opinions are my own.
2.5 Pro is our new frontier model: fresh big model smell with extremely strong reasoning / thinking capabilities. We report single attempt / pass@1 scores for clean comparisons.

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…
Here we go! A new 2.5 Pro with all around capability improvements compared to previous versions. - Much better at code editing now, sota on Aider (82.2), try out this model on cursor! - #1 on webdev-arena (surpassing opus 4). - supports budgets now (128 to 32k) - much better at…

📈📈📈
Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
The whole surprise over 5.5M$ was because everyone is anchored to Llama3’s compute efficiency. Wenfeng himself said it’s about two generations behind frontier lab numbers. Sonnet costs “tens of millions” of dollars, I hope we release the 2.0 Flash / Flash Thinking numbers as…
My thoughts on China, export controls and two possible futures darioamodei.com/on-deepseek-an…