Azalia Mirhoseini

@Azaliamirh

Asst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.

Stanford, CA

Joined May 2013

507Following

14KFollowers

Pinned

Azalia Mirhoseini@Azaliamirh · Apr 29

Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use! With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive…

Azaliamirh's tweet image. Excited to release SWiRL: A synthetic data generation and multi-step RL approach for reasoning and tool use!

With SWiRL, the model’s capability generalizes to new tasks and tools. For example, a model trained to use a retrieval tool to solve multi-hop knowledge-intensive…

396

300

62.0K

Azalia Mirhoseini@Azaliamirh · Jul 21

Very excited to share that an advanced version of Gemini Deep Think is the first to have achieved gold-medal level in the International Mathematical Olympiad! 🏆, solving five out of six problems perfectly, as verified by the IMO organizers! It’s been a wild run to lead this…

TThang Luong@lmthang · Jul 25, 2024

Super thrilled to share that our AI has has now reached silver medalist level in Math at #imo2024 (1 point away from 🥇)! Since Jan, we now not only have a much stronger version of #AlphaGeometry, but also an entirely new system called #AlphaProof, capable of solving many more…

226

2.0K

227

378.0K

Azalia Mirhoseini Retweeted

Quoc Le@quocleix · Jul 21

Excited to share that a scaled up version of Gemini DeepThink achieves gold-medal standard at the International Mathematical Olympiad. This result is official, and certified by the IMO organizers. Watch out this space, more to come soon! deepmind.google/discover/blog/…

709

54.0K

Azalia Mirhoseini Retweeted

Rylan Schaeffer@RylanSchaeffer · Jul 18

If you want to learn about the power (laws) of large language monkeys (and get a free banana 🍌), come to our poster at #ICML2025 !!

9.0K

Azalia Mirhoseini Retweeted

will brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

848

957

98.0K

Azalia Mirhoseini@Azaliamirh · Jul 16

At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…

AAzalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

15.0K

Azalia Mirhoseini@Azaliamirh · Jul 16

Azaliamirh's tweet image. Looking forward to attending ICML!

Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

22.0K

Azalia Mirhoseini Retweeted

Christopher Manning@chrmanning · Jul 2

I’ve joined @aixventureshq as a General Partner, working on investing in deep AI startups. Looking forward to working with founders on solving hard problems in AI and seeing products come out of that! Thank you @ychernova at @WSJ for covering the news: wsj.com/articles/ai-re…

513

49.0K

Azalia Mirhoseini Retweeted

Caia Costello@CaiaCostello · Jun 27

So excited to speak tomorrow about Think Prune Train at LAD'25 session on Reasoning and Self Improvement! iclad.ai

2.0K

Azalia Mirhoseini Retweeted

Oscar Hong@oscrhong · Jun 26

Interesting tidbit from prof @chrmanning: The first mention of “Large Language Model” comes from a 1998 NLP workshop Taiwan! Paper by Chun-Liang Chen, Bo-Ren Bai, Lee-Feng Chien, Lin-Shan Lee. “Large” in 1998 = 20M word corpus

5.0K

Azalia Mirhoseini Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jun 24

Shrinking the Generation-Verification Gap with Weak Verifiers "we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers." "Weaver leverages weak supervision to estimate each verifier’s accuracy and combines their outputs…

125

11.0K

Azalia Mirhoseini@Azaliamirh · Jun 24

Very exciting work on using weak supervision for RL- closing the “generation-verification gap”!! Once again- principled approaches to labeling/data development are the keys!

JJon Saad-Falcon@JonSaadFalcon · Jun 24

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning…

4.0K

Azalia Mirhoseini@Azaliamirh · Jun 26

See @JonSaadFalcon's post for more details: x.com/JonSaadFalcon/… Paper: arxiv.org/abs/2506.18203 Blog: hazyresearch.stanford.edu/blog/2025-06-1… github.com/HazyResearch/s…… Datasets and Models: huggingface.co/collections/ha…

JJon Saad-Falcon@JonSaadFalcon · Jun 24

2.0K

Azalia Mirhoseini@Azaliamirh · Jun 16

Congratulations, @CaiaCostello and Adrian!

SSimon Guo@simonguozirui · Jun 16

So proud of @CaiaCostello who graduated with her CS masters from @stanfordeng 🎓 today! Lucky to have helped her with the TPT project along with @annadgoldie and @Azaliamirh. This is from her presenting the TPT poster at ICLR 🇸🇬workshop!

3.0K

Azalia Mirhoseini@Azaliamirh · Jun 16

Congratulations, Dr. Goldie! @annadgoldie

CChristopher Manning@chrmanning · Jun 15

Huge congratulations to @annadgoldie on receiving her @Stanford PhD today! It’s been a great journey!

8.0K

Azalia Mirhoseini@Azaliamirh · Jun 14

This is a proper Vibe-coding setup for GPU programmers, and can result in getting surprisingly far! I honestly think that if this authoring experience is v1, then v10 might become the normal way GPU experts start writing serious custom kernels! Great work @anneouyang! (finally…

AAnne Ouyang@anneouyang · May 29

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

357

299

46.0K

Azalia Mirhoseini@Azaliamirh · Jun 11

Go, @realSharonZhou and team! Congrats to @LisaSu and AMD on such an amazing addition!

LLisa Su@LisaSu · Jun 11

Welcome aboard @realSharonZhou! So happy to have you and the team joining us as we bring @AIatAMD to the world!!!

3.0K

Azalia Mirhoseini@Azaliamirh · Jun 10

I like this idea very much and have long advocated for something like this. Synthetically enriched «KV prefix» is a natural augment to modern long context models.

�𝚐𝔪𝟾𝚡𝚡𝟾@gm8xx8 · Jun 9

Cartridges: Storing long contexts in tiny caches with self-study - train-once, reusable memory via SELF-STUDY - 38.6× less memory, 26.4× higher throughput - extends context to 484k, composes across corpora - outperforms LoRA, DuoAttention, and standard ICL BLOG:…

171

107

13.0K

Azalia Mirhoseini Retweeted

Hermann@KumbongHermann · Jun 9

Excited to be presenting our new work–HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation– at #CVPR2025 this week. VAR (Visual Autoregressive Modelling) introduced a very nice way to formulate autoregressive image generation as a next-scale prediction task (from…

16.0K