Noam Brown
@polynoamial
Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o3 / o1 / 馃崜 reasoning models
Today, I鈥檓 excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 馃崜) Let me explain 馃У 1/

On IMO P6 (without going into too much detail about our setup), the model "knew" it didn't have a correct solution. The model knowing when it didn't know was one of the early signs of life that made us excited about the underlying research direction!
One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? 馃У
I鈥檓 giving a talk on the speed of progress on LLM capabilities in 3 hours, gotta update the slides 馃槶馃槶
1/N I鈥檓 excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world鈥檚 most prestigious math competition鈥攖he International Math Olympiad (IMO).
It鈥檚 truly a privilege to be able to wake up every morning, see where the latest intelligence frontier is, and help push it a little further.
Their bet allowed for formal math AI systems (like AlphaProof). In 2022, almost nobody thought an LLM could be IMO gold level by 2025.
We are seeing much faster AI progress than **Paul Christiano** and **Yudkowsky** predicted, who had gold in 2025 at 8% and 16% respectively, by methods that are more general than expected
It takes us a few months to turn the experimental research frontier into a product. But progress is so fast that a few months can mean a big difference in capabilities.
So, all the models underperform humans on the new International Mathematical Olympiad questions, and Grok-4 is especially bad on it, even with best-of-n selection? Unbelievable!
Sheryl (@sherylhsu02) was our first hire onto the multi-agent team. Within a few months of joining, she helped to make this possible. We're so lucky to have her on the team!
Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts 馃У