Sanjeev Arora
@prfsanjeevarora
Director, @PrincetonPLI and Professor @PrincetonCS. Seeks math/conceptual understanding of deep learning and large AI models. Also on the "other" social network
Really excited about the launch of this research initiative. Hiring Research Scientists now. Research Software Engineers and postdocs over next few months. 300 H100 GPUs. Multidisciplinary teams. Princeton helps keep AI expertise in the open sphere. More: pli.princeton.edu
“The dramatic rise of AI capabilities…is a watershed event for humanity…It is also sure to transform research and teaching in every academic discipline.” – @prfsanjeevarora, director of the new @Princeton Language and Intelligence initiative. For more: pli.princeton.edu
I predict though that within next year many other teams will achieve this milestone and without using as much compute. Hoping Goedel prover v3 from @PrincetonPLI will too.
Another AI system, ByteDance's SeedProver solved 4 out of 6 IMO problems *with* Lean, and solved a fifth with extended compute. This is becoming routine, like when we went to the moon for the fourth time. There is *nothing* "routine" about this!!...
Agree. Move away from open source will hurt US in the long run.
Kimi K2. More evidence that: • The lead of American “frontier” AI companies is rather small 🔬 • A broad ecosystem of strong foundation model companies is developing in China, with more players than in the US or elsewhere. 🐅 moonshotai.github.io/Kimi-K2/ github.com/MoonshotAI/Kim…
Also wanted to highlight the contributions made by amazing grad students and postdocs and collaborators. Especially @Yong18850571 @sangertang1999 ! 👏👋 Also, note that this is an AI model that is a **solver** of questions. It generates proofs, and is not a verifier of proofs.…
⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, @prfsanjeevarora, @chijinML, @danqi_chen and @PrincetonPLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com
Useful new SWE agent from @PrincetonPLI !
Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for benchmarking, fine-tuning, RL, or just for use from your terminal. It’s open source, simple to hack, and compatible with any LM! Link in 🧵
Analogous to what happened to some capabilities with the emergence of internet utilities in past decades (eg google search and map). Except now the affects happen across many other capabilities.
Is LLM use finally making me less capable? I started using LLMs three years ago for text and code gen. Now, I use several of them, for a ton more things. In fact, I feel like I use them for a huge fraction of the cognitive tasks that I perform that can be described in text.…
More impressive. But Lean provers have progressed a lot in past 6-7 months, so that day isn't far either blog.goedel-prover.com
Question: would it be or less impressive if the Imo gold medals were done in lean?
Everyone's talking about AI performance on the IMO. Let me highlight 🇨🇦Canadian 11th grader Warren Bei🇨🇦, one of five participants with a *perfect* 42/42. This is his *fifth* (and final) IMO representing Canada, with three golds and two silvers. (➡️ MIT undergrad in the fall)
Exactly. Thx
Prof Arora is simply pointing out to an often repeated point by current paradigm skeptics - "current capabilities are not evidence of complex abilities" when that's not the real claim. It's that capabilities are arriving quickly. Imagine throwing an IMO problem to GPT 3.5.
Wish I were there! Let's catch up in Princeton
Just returned from ICML 2025 where I had the honor of keynoting three remarkable workshops. Grateful for the opportunity to delve into topics like self-evolving Alita agents, CRISPR-GPT for AI-driven science, Genome-Bench, reinforcement-learning agents, and AI biosafety. Special…
Agree!
Speculation: Within a year a <100B open weights model will also solve 5/6 IMO problems.
Congratulations on this milestone @demishassabis and GDM!
We achieved this year’s impressive result using an advanced version of Gemini Deep Think (an enhanced reasoning mode for complex problems). Our model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions –…
Completely misses the point. Nobody is suggesting that solving IMO problems is useful for math research. The point is that AI has become really good at complex reasoning, and is not just memorizing its training data. It can handle completely new IMO questions designed by a…
Quote of the day: I certainly don't agree that machines which can solve IMO problems will be useful for mathematicians doing research, in the same way that when I arrived in Cambridge UK as an undergraduate clutching my IMO gold medal I was in no position to help any of the…
A thoughtful analysis by @ErnestRyu but it is missing one key insight. To bring the AI model even to IMO gold level, one has to train it to generate new questions and then solve them. (There aren't enough human-generated questions for training.) This is a key idea in Deepmind's…
Two cents on AI getting International Math Olympiad (IMO) Gold, from a mathematician. Background: Last year, Google DeepMind (GDM) got Silver in IMO 2024. This year, OpenAI solved problems P1-P5 for IMO 2025 (but not P6), and this performance corresponds to Gold. (1/10)
Congratulations! Also thanks for making me win my bet with @JitendraMalikCV a year ahead of schedule.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Thanks for your positive words. Numina's dataset release was a big enabler for this research area last year.
Impressive result ! High performance with low pass rate. Congrats to the Goedel prover team
Next huge source of training data? Strong Indian students focus primarily on math/science. We should expect similar announcements from other AI actors with deep pockets.
Exciting news for students in India🇮🇳: get your free @GeminiApp Pro plan for 1 year! This gives you higher rate access to all our best models: 2.5 Pro, Veo 3, Deep Research, NotebookLM, and 2TB storage. Claim it at goo.gle/freepro - enjoy!
We’re proud that PLI students, post-docs, and faculty will be featuring over 20 papers at the @icmlconf in Vancouver this week! From safer AI agents to long-context reasoning and RL, we’re excited to showcase the cutting edge research for you here: pli.princeton.edu/blog/2025/prin…
20+ papers (including several spotlights) from @PrincetonPLI being presented at ICML this week. pli.princeton.edu/blog/2025/prin…