Ahmad Beirami
@abeirami
something new | Ex-{@GoogleDeepMind/@GoogleResearch, @AIatMeta, @EA, @MIT, @Harvard, @DukeU} | @GeorgiaTech PhD | زن زندگی آزادی
After three incredible years, today is my last day at Google DeepMind! I am truly grateful to the amazing colleagues who made the journey 1000x more fruitful and enjoyable! I am forever indebted to my collaborators who showed me how to be better at everything via demonstrations.

Looks like the context for this is now clear!
reciprocal reviewing is a terrible idea that unfortunately more conferences are adopting, as if we didn't already have enough problems with the review quality from people who are willing to review.
In addition to being impressive work, it also shows that the progress in general thinking methods lies on correctness verification. Gemini 2.5 Pro public is capable of generating the correct solution with a simple scaffold and routing prompts for cheap. A verifier needs to be…
🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025
We should build anti-virus software that detects such prompt injections and flags them. This should be integrated into conference PDF upload page.
ICML’s Statement about subversive hidden LLM prompts We live in a weird timeline…
🚨New Paper!🚨 We trained reasoning LLMs to reason about what they don't know. o1-style reasoning training improves accuracy but produces overconfident models that hallucinate more. Meet RLCR: a simple RL method that trains LLMs to reason and reflect on their uncertainty --…
🚀 Excited to share that the Workshop on Mathematical Reasoning and AI (MATH‑AI) will be at NeurIPS 2025! 📅 Dec 6 or 7 (TBD), 2025 🌴 San Diego, California
This applies to any other task you delegate to AI. If you can't and don't verify the work, then you're using the models wrongly! if the task is too big to verify then you're using the models wrongly too!
If you want to start a software startup, you should still learn to program. Even if AI writes most of your code, you'll still be in the position of an engineering manager, and to be a good engineering manager you have to be a programmer yourself.
Hiring! We're looking to fill contractor Research Engineer roles in New York City to work with us in FAIR on AI Research Agents. If that sounds fun, please fill out the expression of interest here: forms.gle/7m4fVqLXY5GwuL…
One way to think about it: I like exercising - lifting some weights & running. But a crane lifts more than me, and a car goes faster than me. This takes nothing from the sheer human joy of exercise. Also fast cars add to our joy of superhuman speed. Same w/ math. And chess & go.
the openai IMO news hit me pretty heavy this weekend i'm still in the acute phase of the impact, i think i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don't think i…
To me this doesn't refute and in fact verifies the claim that Gemini 2.5 pro with a relatively simple agentic scaffold and a researcher's budget is capable of getting gold, which is the impressive finding here! - The problem specific hints are quite generic. What's missing is a…
Interesting approach! However, we looked at the proofs and methodology and we found a few problems, specifically with the use of hints given to the model. While the scaffold indeed improves performance, it does not solve all problems accurately and would not get a gold medal.🧵
Great tips on how to do research!
Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick…
Important lessons on rigorous evaluation of AI model behaviors. Drawing on the historical example (and fun story) of hype around "chimps learning language". Given the importance of AI safety research, rigor and credibility is absolutely necessary. A great read from the folks at…
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick…
Very nice talk, especially if you are new to publishing!
Had an amazing time @NewInML @icmlconf giving a talk on "What I Wish I knew before starting a PhD (but learnt the hard way)"! Loved the post-talk discussions and the heart warming messages :) Sharing slides since some people asked, link in the tweet below 👇
Had an amazing time @NewInML @icmlconf giving a talk on "What I Wish I knew before starting a PhD (but learnt the hard way)"! Loved the post-talk discussions and the heart warming messages :) Sharing slides since some people asked, link in the tweet below 👇
The best research questions arise from engaging hands-on with working systems & experiencing the issues; abstracting them into well-defined technical problems, and only then thinking about solutions! P.S. Most high value work in industry actually involves smart "data cleaning."
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking