Thomas Ahle
@thomasahle
Head of AI @NormalComputing. Ex @Meta, @BARCdk, SupWiz, @OxfordQuantum. Tweets on Math, AI, #dspy, Probability, ML, Algorithms and Randomness. Recently tensors.
I love retro websites—so I tried to design the Tensor Cookbook (tensorcookbook.com) based on the original Matrix Cookbook aesthetics from 2006. I think it works pretty well down to the purple text, Verdana font, and everything.


what is the best Multi Agent observability tool? how do you get a good top down view of 100 agents all talking to a parent or to each other?
Is anyone having success running a full Claude Code style agentic coding loop against a local model? This feels like the ultimate challenge for local tool calling right now, as it requires potentially dozens of calls in a loop and good performance over a longer context
There will be *no more than 5 days* between the release of GSPO and its implementation in TRL
If you are building agents, I highly recommend using the Claude Code SDK: docs.anthropic.com/en/docs/claude…
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
did u know you can use the new Gemini image segmentation feature in… a lot of different ways
After 78 years, an exponential improvement for Ramsey numbers were found by Jie Ma, Wujie Shen, and Shengjie Xie. gilkalai.wordpress.com/2025/07/23/ama…
the openai IMO news hit me pretty heavy this weekend i'm still in the acute phase of the impact, i think i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don't think i…
We had IMO gold at home the whole time!
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
anyway, none of this is possible because the data processing inequality
Days after the Deepmind/OpenAI results, the Kimi K2 paper already details how to do RL with non-verifiable rewards! So much impressive stuff in this paper
Kimi K2 paper dropped! describes: - MuonClip optimizer - large-scale agentic data synthesis pipeline that systematically generates tool-use demonstrations via simulated and real-world environments - an RL framework that combines RLVR with a self- critique rubric reward mechanism…
Interesting so GDM ran multiple models, likely oai also did. Would be curious and enlightening for both companies to report the results of all models they tried, not just the ones that scored highest!
Likewise, we also evaluated the model without RAG or tools!