Swarat Chaudhuri
@swarat
Professor at @UTCompSci, Research Scientist at @GoogleDeepmind. Automated Reasoning + Machine Learning + Programming Languages.
🔥 @GoogleDeepMind just dropped their "formal conjectures" project - formalizing statements of math's biggest unsolved mysteries in #LeanLang and #Mathlib! This Google-backed project is a HUGE step toward developing "a much richer dataset of formalized conjectures", valuable…
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
For any teams soon to release formal IMO results, and will be sharing PutnamBench results as part of their announcement, please *do not* publicly share PutnamBench proofs to prevent contamination.
While IMO is trending, our model leads on college-level math (Putnam Benchmark)—nearly doubling the problems solved by prior SOTA, with formal, verifiable proofs! Moreover, it’s not just an announcement—you can actually download and use our model. 🙂
🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *…
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad - the most prestigious mathematics competition in the world. To uphold the sanctity of the student competition, the IMO Board…
Thrilled about this achievement by the Gemini Deep Think team! Over the last two years, we have seen extraordinary progress in both formal and informal math. For research-level problems that are out-of-distribution by definition, I suspect one would need to combine the two.
An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵
Passionate about frontier AI models, classical symbolic reasoning, and safe/secure software? Consider applying for this position on AI-aided code analysis in my team at @GoogleDeepmind: job-boards.greenhouse.io/deepmind/jobs/…. The job is London-based, and the application deadline is July 14.
Just learned that @IsilDillig won the #SIGPLAN Robin Milner Junior researcher award this year! 🎈 🍾 The award goes to one outstanding mid-career PL researcher each year, and it’s hard to think of a more deserving candidate for it. Congratulations, Isil! sigplan.org/Awards/Milner/
Every machine in a Hospital that diagnoses your body without cutting you open is based on a principle of Physics, discovered by a Physicist who had no interest in Medicine. If you think the world doesn’t need Basic Science, or that somehow Science has failed you, think again.
🎂#LeanLang 0.1 was released 11 years ago today! While it's true that Lean had been in development for almost a year prior (earliest commit July 15, 2013!) the release of Lean 0.1 was a major milestone. The screenshot below is from a June 26th, 2014 snapshot of the…
I’m presenting Escher (trishullab.github.io/escher-web) at #CVPR2025 Saturday morning (Poster Session #3; #236). Escher builds a visual concept library with a vision‑language critic (no human labels needed). Swing by if you’d like to chat about program synthesis & multimodal reasoning!
🎓 Congrats to Ashish Sharma, @UW on receiving the ACM Doctoral Dissertation Award for his dissertation, "Human-AI Collaboration to Support Mental Health and Well Being." 👏 Honorable Mentions: Alexander Kelley, @UofIllinois Sewon Min, @UCBerkeley
The reception for the 2025 @GuggFellows was a mind-blowing experience 🤯. So many brilliant projects on topics from photography to poetry to astronomy to paleontology and more! Much gratitude to the Guggenheim Foundation for the support — never has it been more needed than now.


1/3 The US didn’t end up leading the world in computing by luck. It happened because it made long-term, public investments in basic research, especially through NSF. That’s what created the technology that today’s companies are built on.
Revoking visas to Chinese PhD students is economically shortsighted and inhumane. Most Chinese PhD students stay in the U.S. after graduation (first image, stats from 2022). They're staying and building technology in the U.S., not taking it to China. Immigrant students create…
Excited for the CLEVER Benchmark for verified code generation in Lean, led by @AmitayushThakur & team! 161 tasks! ✅ Fully verified — all correctness is machine-checked 📷 Leakage-resistant — specs are non-computable propositions, so models can't copy logic 🧠 Truly end-to-end…
1/🧵Excited to share CLEVER — a new benchmark for end-to-end verified code generation in Lean. Can we go from natural language to a formally verified Lean program? CLEVER puts this to the test. 📄 arxiv.org/abs/2505.13938 💻 github.com/trishullab/cle…
Wish there was a way to convey to people supporting the admin’s cuts to the National Science Foundation and science funding more broadly that they’re in the process of destroying American science for at least a generation. Many genuinely seem not to understand this is happening.
It was fun to chat with @sinhadipanjan about curiosity-driven exploration in AI for math and science.
#HTwknd ✨| 🧠 CAN AI THINK? Indian researcher Swarat Chaudhuri believes it soon might — not just process info, but be curious. READ full story in HT App ⏬ hindustantimes.com/lifestyle/art-… (@sinhadipanjan writes ✍🏻) | @swarat
AI increasingly excels at formal math, opening up a path to large-scale automated formal verification. However, most AI-for-math work focuses on pure math rather than verification. CLEVER, a sort of HumanEval for Lean, fills this gap. The SOTA here is weak, so there’s much to do!
1/🧵Excited to share CLEVER — a new benchmark for end-to-end verified code generation in Lean. Can we go from natural language to a formally verified Lean program? CLEVER puts this to the test. 📄 arxiv.org/abs/2505.13938 💻 github.com/trishullab/cle…