Alex Gu
@minimario1729
intern @ meta, mit phd student (on job market?), llm for math+code / prev nvidia, aws, jane street / enjoys 🎹✈️⛷️⛵
Thanks to MIT News for covering our vision of AI for code! A lot of progress made, but still a long way to go!
Can AI actually code for us? 🧵 MIT research reveals there’s a "long way to go" due to bottlenecks like assessment, codebase scale, & incorrect retrievals. The work reflects a vision to let humans focus on high-level design while routine work is automated:…
postering this work on behalf of awesome coauthors at the ai for math workshop tomorrow :)
Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle…
come to our ai for math workshop tomorrow it'll be super fun!! 🎉🎉

ai for math workshop papers released, it's a fun batch🚀 openreview.net/group?id=ICML.…
Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…
Introducing FlexOlmo, a new paradigm for language model training that enables the co-development of AI through data collaboration. 🧵
pls review for our workshop 🥺
We are looking for more reviewers. If you are interested, please fill out this form: docs.google.com/forms/d/e/1FAI…
i'm not sure this duo lingo course is that popular but it filled my need
congrats to lean!! it's amazing and this is so well deserved 😀
Incredibly grateful to @TheOfficialACM SIGPLAN for awarding #LeanLang the Programming Languages Software Award 2025 at #PLDI2025! 🎉 "The Lean theorem prover is a remarkable software artifact... Lean has had and continues to have a broad impact on industrial practice and…
interesting analysis 👀 more people should go read livecodebench outputs rather than just compare numbers!!
To think or not to think -- what distinguishes the two Opus 4 variants? In practice, I haven't found much of a difference between the Opus thinking/non-thinking variants. I looked into the LCB results as a clear point of comparison.
anyone with cool projects in ai for math, pls submit to our workshop @ icml! 🥺 deadline june 21 !!
The paper submission deadline is within 1 week: June 21st, 2025, AoE The challenge deadline is in 2 weeks: July 1st, 2025, AoE 📢Call for Papers: sites.google.com/view/ai4mathwo… 🎯Challenge 1, File-level Automated Proof Engineering (APE) of Formal Math Libraries (APE-Bench I):…
👀can your language model solve this inequality? 👋check out ineqmath, our new challenging benchmark containing 200 high-school olympiad inequalities, with leading models scoring under half! also fun for humans to try😝
Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle…
great to see our AI for SWE paper is applicable not just for researchers, but superstar startup founders too! 😀
my current agent reading list: Challenges and Paths Towards AI for Software Engineering by @minimario1729 arxiv.org/abs/2503.22625 Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents by @pranav__putta arxiv.org/abs/2408.07199 what are your fav agent papers?
im (still) in the bay area, if you're around let's hang out! (and always looking for new activity inspiration 😀)

new deepseek release almost on-par with o3 (high) on livecodebench 😲🚀

Glad to announce the ICML 2025 Challenge on Automated Math Reasoning and Extensions! 🌟🧮⚛️ Track 1: File-level Automated Proof Engineering (APE) of Formal Math Libraries (APE-Bench I). Participation: codabench.org/competitions/8… Track 2: Physics Reasoning with Diagrams and…
come hear my thoughts on the future of AI for SWE @ iclr tomorrow! 🔮 🤖
What's the future of AI for Software Engineering? 🤖 Join Alex Gu (@minimario1729) (MIT; StarCoder, CRUXEval, LeanDojo contributor) tomorrow at the #DL4C workshop! He'll cover current challenges in AI for SE and promising directions for what lies ahead. #ICLR2025 #ICLR