Azalia Mirhoseini
@Azaliamirh
Asst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.
Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive…

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…
Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more…
Excited to share that a scaled up version of Gemini DeepThink achieves gold-medal standard at the International Mathematical Olympiad. This result is official, and certified by the IMO organizers. Watch out this space, more to come soon! deepmind.google/discover/blog/…
If you want to learn about the power (laws) of large language monkeys (and get a free banana 🍌), come to our poster at #ICML2025 !!
cant stop thinking about this one insanely elegant, seems insanely powerful
At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…
Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

I’ve joined @aixventureshq as a General Partner, working on investing in deep AI startups. Looking forward to working with founders on solving hard problems in AI and seeing products come out of that! Thank you @ychernova at @WSJ for covering the news: wsj.com/articles/ai-re…
So excited to speak tomorrow about Think Prune Train at LAD'25 session on Reasoning and Self Improvement! iclad.ai
Interesting tidbit from prof @chrmanning: The first mention of “Large Language Model” comes from a 1998 NLP workshop Taiwan! Paper by Chun-Liang Chen, Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee. “Large” in 1998 = 20M word corpus
Shrinking the Generation-Verification Gap with Weak Verifiers "we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers." "Weaver leverages weak supervision to estimate each verifier’s accuracy and combines their outputs…
Very exciting work on using weak supervision for RL- closing the “generation-verification gap”!! Once again- principled approaches to labeling/data development are the keys!
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…
See @JonSaadFalcon's post for more details: x.com/JonSaadFalcon/… Paper: arxiv.org/abs/2506.18203 Blog: hazyresearch.stanford.edu/blog/2025-06-1… github.com/HazyResearch/s…… Datasets and Models: huggingface.co/collections/ha…
How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…
Congratulations, @CaiaCostello and Adrian!
So proud of @CaiaCostello who graduated with her CS masters from @stanfordeng 🎓 today! Lucky to have helped her with the TPT project along with @annadgoldie and @Azaliamirh. This is from her presenting the TPT poster at ICLR 🇸🇬workshop!
Congratulations, Dr. Goldie! @annadgoldie
Huge congratulations to @annadgoldie on receiving her @Stanford PhD today! It’s been a great journey!
This is a proper Vibe-coding setup for GPU programmers, and can result in getting surprisingly far! I honestly think that if this authoring experience is v1, then v10 might become the normal way GPU experts start writing serious custom kernels! Great work @anneouyang! (finally…
✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]
Go, @realSharonZhou and team! Congrats to @LisaSu and AMD on such an amazing addition!
Welcome aboard @realSharonZhou! So happy to have you and the team joining us as we bring @AIatAMD to the world!!!
I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natural augment to modern long context models.
Cartridges: Storing long contexts in tiny caches with self-study - train-once, reusable memory via SELF-STUDY - 38.6× less memory, 26.4× higher throughput - extends context to 484k, composes across corpora - outperforms LoRA, DuoAttention, and standard ICL BLOG:…
Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…