Anne Ouyang

@anneouyang

CS PhD student @Stanford | prev: cuDNN @Nvidia, M.Eng, B.S. in CS @MIT | efficient scalable self-improving AI systems | 🌽KernelBench

sf bay area

Joined August 2013

861Following

6KFollowers

Pinned

Anne Ouyang@anneouyang · May 29

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

anneouyang's tweet image. ✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6)

[🔗 link in final post]

137

974

791

173.0K

Anne Ouyang Retweeted

Azalia Mirhoseini@Azaliamirh · Jul 16

Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!

22.0K

Anne Ouyang@anneouyang · Jul 13

cheburashka > labubu

anneouyang's tweet image. cheburashka &gt; labubu

3.0K

Anne Ouyang@anneouyang · Jul 1

Ceci n'est pas une pipeline

2.0K

Anne Ouyang@anneouyang · Jun 27

tech is full of people quietly haunted by the artist they could've been ✨

4.0K

Anne Ouyang Retweeted

Anjiang Wei@anjiangw · Jun 25

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 arxiv.org/pdf/2503.23145 💻 github.com/Anjiang-Wei/Co… 🌐 anjiang-wei.github.io/CodeARC-Websit… #LLM #Reasoning #LLM4Code #ARC

11.0K

Anne Ouyang@anneouyang · Jun 14

This is a proper Vibe-coding setup for GPU programmers, and can result in getting surprisingly far! I honestly think that if this authoring experience is v1, then v10 might become the normal way GPU experts start writing serious custom kernels! Great work @anneouyang! (finally…

AAnne Ouyang@anneouyang · May 29

358

298

46.0K

Anne Ouyang@anneouyang · Jun 7

Exciting! Looking forward to it

JJustus Mattern@MatternJustus · Jun 7

Kernelbench by @simonguozirui and @anneouyang is about to land in prime-rl 🛬 Our next reasoning model will be much better at writing kernels!

3.0K

Anne Ouyang@anneouyang · May 29

Thanks for the repro! I also attached the result of running this layer norm kernel on an Nvidia 5090 (1311% perf of baseline) Kernels are very hardware (and problem size) dependent, but that’s great news for auto kernel optimization. AI can easily run architecture and workload…

FFleetwood@fleetwood___ · May 29

Did a mini replication on Colab of the LayerNorm kernel (because 484.4% seemed hard to believe) and it ~replicates (T4 vs L40 etc). Super impressive work! Even kernel engineers aren't safe.

6.0K

Anne Ouyang@anneouyang · May 14

cool work by @CaiaCostello! interesting takeaway about model collapse from the angle of confidence vs. diversity in math and coding tasks

CCaia Costello@CaiaCostello · May 13

1/5 Can small models learn to reason without RL or large datasets? Success of LLM post-training with synthetic data hinges on: 1. Generating Model Size 2. Synthetic Data Volume 3. Pruning Strategy 4. Number of Fine-Tuning Rounds We found a simple recipe: Think, Prune, Train (TPT)

3.0K

Anne Ouyang Retweeted

Tri Dao@tri_dao · Feb 26

KernelBench as a whole is great! It's a big step forward for automatic kernel generation. Looking forward to the next version

4.0K

Anne Ouyang Retweeted

Modal@modal_labs · Feb 25

Fresh on the arXiV and powered by Modal: work from @anneouyang, @simonguozirui, and others on writing faster model inference using large language models inference 🪆

10.0K

Anne Ouyang@anneouyang · Feb 25

arxiv's out!

SSimon Guo@simonguozirui · Feb 25

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

10.0K

Anne Ouyang@anneouyang · Feb 24

congrats on the launch!!

GGPU MODE@GPU_MODE · Feb 23

Write a fast kernel and run it on Discord. See how you compare against the best! If you're familiar with Leetcode, Kaggle or Codeforces then this should feel right at home

3.0K

Anne Ouyang@anneouyang · Feb 20

another interesting work on LLM for kernel gen ft. KernelBench!

SSakana AI@SakanaAILabs · Feb 20

Introducing The AI CUDA Engineer: An agentic AI system that automates the production of highly optimized CUDA kernels. sakana.ai/ai-cuda-engine… The AI CUDA Engineer can produce highly optimized CUDA kernels, reaching 10-100x speedup over common machine learning operations in…

5.0K