Machine Learning Street Talk
@MLStreetTalk
MLST is by Dr. Tim Scarfe @ecsquendor w/ cameos from @DoctorDuggar https://www.patreon.com/mlst (early access/private discord) - Sponsor us!
Superman III was powerful metaphor for our relationship with technology. The true danger today isn't the machines actively taking over, but rather us being "sucked into the machine" gradually losing our authenticity and agency to become components of a larger technological…
Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:
🚀Introducing Hierarchical Reasoning Model🧠🤖 Inspired by brain's hierarchical processing, HRM delivers unprecedented reasoning power on complex tasks like ARC-AGI and expert-level Sudoku using just 1k examples, no pretraining or CoT! Unlock next AI breakthrough with…
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! deepmind.google/discover/blog/…
Yes, there is an official marking guideline from the IMO organizers which is not available externally. Without the evaluation based on that guideline, no medal claim can be made. With one point deducted, it is a Silver, not Gold.
🚨 According to a friend, the IMO asked AI companies not to steal the spotlight from kids and to wait a week after the closing ceremony to announce results. OpenAI announced the results BEFORE the closing ceremony. According to a Coordinator on Problem 6, the one problem OpenAI…
We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇
Google got the IMO gold! I heard they delayed announcing out of respect for the human participants. Excellent job 🙏
Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…
Yes. Writing is not a second thing that happens after thinking. The act of writing is an act of thinking. Writing *is* thinking. Students, academics, and anyone else who outsources their writing to LLMs will find their screens full of words and their minds emptied of thought.
Apple Intelligence Foundation Language Models: Tech Report 2025 "We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through…
Not falling for OpenAI’s hype-vague posting about the new IMO gold model with “general purpose RL” and whatever else “breakthrough.” Google also got IMO gold (harder than mastering AIME), but remember, simple ideas scale best.
Sometimes it is important to take a moment and celebrate -- we achieved all of this in 3 years. Pretty incredible impact from @Cohere_Labs 🔥
Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 matharena.ai/imo/ Nice blog post from the team behind MathArena: Evaluating LLMs on Uncontaminated Math Competitions (arxiv.org/abs/2505.23281) providing independent analysis of LLM performance on IMO.
Intelligence isn't a collection of skills. It's the efficiency with which you acquire and deploy new skills. It's an efficiency ratio. And that's why benchmark scores can be very misleading about the actual intelligence of AI systems.
Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)
Terence Tao quietly seething at OpenAI is a sight to behold.
"I use AI in a separate window. I don't enjoy Cursor or Windsurf, I can literally feel competence draining out of my fingers." @dhh, the legendary programmer and creator of Ruby on Rails has the most beautiful and philosophical idea about what AI takes away from programmers.
Just read @OpenAI's solution to IMO Problem 1. The math checks out—it nailed the key lemma: for n > 3, any n-line cover of P_n must include a triangle side (i.e. a non-sunny line). That reduces the problem to n = 3, where everything becomes casework. Clean move. However the…
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Speaking as a past IMO contestant, this is impressive but misleading - gold vs silver is meaningless, 1 pt below gold vs borderline gold is noise The impressive bit is using a general reasoning model, not a specialised system, and no verified reward. Peak AI maths is unchanged
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Fantastic result from OAI achieving IMO gold medal level performance. However -- we disagree with the premise that superhuman performance on specific technical math questions implies opening the floodgates to scientific discovery. IMO - the harder part is (creatively) coming…
Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline 🧵
If we compared AI capabilities against humans with no access to tools, such as the internet, we would probably find that AI already outperformed humans at many or most cognitive tasks we perform at work. But of course this is not a helpful comparison and doesn’t tell us much…
Great nuanced take from Arvind and Sayash (as usual)
Some aspects of AI discourse seem to come from a different planet, oblivious to basic realities on Earth. AI for science is one such area. In this new essay, @sayashk and I argue that visions of accelerating science through AI should be considered unserious if they don't confront…