Simon Guo

@simonguozirui

CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia

Palo Alto, CA

Joined September 2014

5KFollowing

3KFollowers

Pinned

Simon Guo@simonguozirui · Feb 25

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

simonguozirui's tweet image. LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench!

Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline &lt;20% of the time.

More 🧵👇

306

134

106.0K

Simon Guo@simonguozirui · 23 h

When eager mode fallback quickly swoops after it hits a part of the graph that torch.compile can't compile

LLost Meme Archive@LostMemeArchive · Jul 26

Luigi Death Stare (2014)

8.0K

Simon Guo Retweeted

Tilde@tilderesearch · Jul 25

~4/8~ For the forward pass, we developed a specialized two-kernel implementation. The first fuses gather, projections, and SwiGLU, while the second uses an efficient scatter into BF16 + reduce for the sum.

2.0K

Simon Guo Retweeted

Tilde@tilderesearch · Jul 25

Mixture‑of‑Experts (MoE) powers many frontier models like R1, K2, & Qwen3 ⚡️ To make frontier-scale MoE models accessible to train, we open-source MoMoE, a hyper-performant MoE implementation built for training and inference, outpacing the fastest existing ones by up to: - 70%…

328

240

35.0K

Simon Guo Retweeted

Sid@sid_srk · Jul 26

Announcing The Toronto School Of Foundation Modelling, a Toronto exclusive, in-person only school for learning to build Foundation Models. Coming to New Stadium and Youthful Vengeance in late August 2025.

136

13.0K

Simon Guo Retweeted

Mark Saroufim@marksaroufim · Jul 25

On Sep 6 in NYC, this won't be your typical hackathon where you do your own thing in a corner and then present at the of the day. You'll deploy real models to the market, trades will happen, chaos should be expected. The fastest model is great but time to market matters more.

12.0K

Simon Guo Retweeted

Kilian Lieret@KLieret · Jul 24

Releasing mini, a radically simple SWE-agent: 100 lines of code, 0 special tools, and gets 65% on SWE-bench verified! Made for benchmarking, fine-tuning, RL, or just for use from your terminal. It’s open source, simple to hack, and compatible with any LM! Link in 🧵

784

901

101.0K

Simon Guo Retweeted

Roberta Raileanu@robertarail · Jul 24

I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…

255

2.0K

1.0K

300.0K

Simon Guo Retweeted

Google DeepMind@GoogleDeepMind · Jul 23

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it's like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵

345

3.0K

844

369.0K

Simon Guo Retweeted

Google DeepMind@GoogleDeepMind · Jul 23

We tested Aeneas on the Res Gestae Divi Augusti – one of the most debated inscriptions. Without prior knowledge, it successfully mapped out the leading scholarly theories on its dating, showing how AI can help model history in a quantitative way. 📊

192

26.0K

Simon Guo Retweeted

Alessio Devoto@devoto_alessio · Jul 21

🏆 Our @nvidia KV Cache Compression Leaderboard is now live! Compare state-of-the-art compression methods side-by-side with KVPress. See which techniques are leading in efficiency and performance. 🥇 huggingface.co/spaces/nvidia/…

254

102

17.0K

Simon Guo@simonguozirui · Jul 21

Check out Tokasaurus on Modal to make Llama-1B brrr! This repeated sampling example shows off two engine features that are important for serving small models: very low CPU overhead and automatic shared prefix exploitation with Hydragen.

CCharles 🎉 Frye@charles_irl · Jul 21

Tokasaurus, the "little LLM engine that could" by @jordanjuravsky and @EyubogluSabri of @HazyResearch/@ScalingIntelLab, is capable of some pretty impressive perf. We replicated their report of >80k tok/s for 16bit LLaMA 3.1 8B on Large Language Monkeys GSM8K - and you can too!

3.0K

Simon Guo@simonguozirui · Jul 21

officially out! imo this is super exciting for two reasons: 1) text in text out, no lean decoding or tool-use and 2) reasoning capabilities that scale with compute

GGoogle DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

7.0K

Simon Guo Retweeted

Dev Valladares@dev_valladares · Jul 18

Infinite Wiki ⁕ Every word is a hyperlink. Every description is generated in real-time (in ~1 second) ⁕ Runs on Gemini 2.5 Flash Lite. ASCII diagrams using 2.5 Flash

102

232

3.0K

2.0K

242.0K

Simon Guo Retweeted

Vijay@__tensorcore__ · Jul 20

developer.nvidia.com/blog/cutlass-p… marks the start of a short series of blogposts about CUTLASS 3.x and CuTe that we've been meaning to write for years. There are a few more parts to come still, hope you enjoy!

301

229

41.0K

Simon Guo@simonguozirui · Jul 19

Watching the model solve these IMO problems and achieve gold-level performance was magical. A few thoughts 🧵

AAlexander Wei@alexwei_ · Jul 19

1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

129

2.0K

353

582.0K

Simon Guo Retweeted

Yi Ma@YiMaTweets · Jul 18

After studying the mathematics and computation of Sparsity for nearly 20 years, I have just realized that it is much more important than I ever realized before. It truly serves as *the* model problem to understand deep networks and even intelligence to a large extent, from a…

505

310

76.0K

Simon Guo@simonguozirui · Jul 17

And @keshigeyan is going to be presenting about Grafting - a great collaboration with @MichaelPoli6 on how to distill pretrained diffusion models into new architectures (Transformers -> Hyenas) 4/

KKeshigeyan Chandrasegaran@keshigeyan · Jun 10

1/ Model architectures have been mostly treated as fixed post-training. 🌱 Introducing Grafting: A new way to edit pretrained diffusion transformers, allowing us to customize architectural designs on a small compute budget. 🌎 grafting.stanford.edu Co-led with @MichaelPoli6

1.0K

Simon Guo Retweeted

Alex Zhang@a1zhang · Jul 18

If you’re staying for the #ICML2025 workshops, you should definitely go to @m_sirovatka’s talk today on the infra and design of @GPU_MODE’s OSS GPU leaderboard. He has a lot of interesting stuff to share :D

3.0K

Simon Guo Retweeted

will brown@willccbb · Jul 17

cant stop thinking about this one insanely elegant, seems insanely powerful

848

955

98.0K

Simon Guo@simonguozirui · Jul 18

Large Language Monkeys are scaling and they are hungry! 🍌 #ICML2025

RRylan Schaeffer@RylanSchaeffer · Jul 9

I'll be at @icmlconf #ICML2025 next week to present three papers - reach out if you want to chat about generative AI, scaling laws, synthetic data or any other AI topic! #1 How Do Large Language Monkeys Get Their Power (Laws)? x.com/RylanSchaeffer…

10.0K