Alex Zhang

@a1zhang

@SakanaAILabs, incoming phd student @MIT_CSAIL, ugrad @princeton | go participate in the @GPU_MODE kernel competitions!!!

USA

Joined December 2015

535Following

12KFollowers

Pinned

Alex Zhang@a1zhang · May 28

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

539

233

119.0K

Alex Zhang@a1zhang · Jul 18

If you’re staying for the #ICML2025 workshops, you should definitely go to @m_sirovatka’s talk today on the infra and design of @GPU_MODE’s OSS GPU leaderboard. He has a lot of interesting stuff to share :D

a1zhang's tweet image. If you’re staying for the #ICML2025 workshops, you should definitely go to @m_sirovatka’s talk today on the infra and design of @GPU_MODE’s OSS GPU leaderboard. He has a lot of interesting stuff to share :D

3.0K

Alex Zhang@a1zhang · Jul 17

Bro actually denied OpenAI an AlphaGo moment LOL @FakePsyho is him. Huge congrats👏👏

424

13.0K

Alex Zhang@a1zhang · Jul 16

New @GPU_MODE x Jane Street 1-day GPU programming hackathon in-person in NYC! Talks by the wonderful @tri_dao, @soumithchintala, and other PyTorch folks! If you're at #ICML25 check out more information at the Jane Street both! Register by Aug 17: bit.ly/3TS0d9I?r=qr

a1zhang's tweet image. New @GPU_MODE x Jane Street 1-day GPU programming hackathon in-person in NYC! Talks by the wonderful @tri_dao, @soumithchintala, and other PyTorch folks!

If you're at #ICML25 check out more information at the Jane Street both!

Register by Aug 17: bit.ly/3TS0d9I?r=qr

8.0K

Alex Zhang@a1zhang · Jul 13

sadly won’t be at ICML but have 2 papers that you should check out! KernelBench which @simonguozirui will be presenting at the main conference ^_^ + the @GPU_MODE leaderboard’s OSS infra at the CODEML workshop (7/19) that @m_sirovatka will be giving an oral for! Lots of 🍿!!

a1zhang's tweet image. sadly won’t be at ICML but have 2 papers that you should check out!

KernelBench which @simonguozirui will be presenting at the main conference ^_^

+ the @GPU_MODE leaderboard’s OSS infra at the CODEML workshop (7/19) that @m_sirovatka will be giving an oral for!

Lots of 🍿!!

5.0K

Alex Zhang Retweeted

SWE-bench@SWEbench · Jul 11

SWE-agent is now Multimodal! 😎 We're releasing SWE-agent Multimodal, with image-viewing abilities and a full web browser for debugging front-ends. Evaluate your LMs on SWE-bench Multimodal or use it yourself for front-end dev. 🔗➡️

2.0K

Alex Zhang@a1zhang · Jul 11

Very much a noob question, but for benchmarking CUDA code speed we generally have to clear caches so multiple repeated runs are fair. If I were to benchmark CPU code speed (e.g. on AlgoTune), does a similar principle apply? And how easy is it to do this in say Python?

2.0K

Alex Zhang@a1zhang · Jul 11

ATP when I read that a model scored X% overall speedup on a benchmark my brain doesn’t know how to react “AI to optimize X” benchmarks shouldn’t be reported as average improvement over a fixed baseline, it’s super inflated and confusing Are there better alternatives?

1.0K

Alex Zhang@a1zhang · Jul 10

Does anyone know the differences between nvbench, Triton’s do_bench, and the DeepSeek DeepGEMM’s bench_kineto (calls PyTorch profiler with l2 cache flush)? Just looking to accurately benchmark kernels over a fixed set of shapes (input distribution can vary), also flushing cache.

2.0K

Alex Zhang@a1zhang · Jul 8

BTW this number is only a tiny fraction of what we have planned :p

MMatej Sirovatka@m_sirovatka · Jul 8

The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️

3.0K

Alex Zhang Retweeted

Matej Sirovatka@m_sirovatka · Jul 8

318

150

31.0K

Alex Zhang Retweeted

Ori Press@ori_press · Jul 2

Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️

154

22.0K

Alex Zhang@a1zhang · Jul 2

small life update before the PhD: bittersweet moment but I recently left the awesome folks @vant_ai & will put my bioml interests on hold for a bit in other news, I’ve joined @SakanaAILabs for the summer! happy to chat abt either :p

174

19.0K

Alex Zhang Retweeted

hardmaru@hardmaru · Jul 1

Inference-Time Scaling and Collective Intelligence for Frontier AI sakana.ai/ab-mcts/ We developed AB-MCTS, a new inference-time scaling algorithm that enables multiple frontier AI models to cooperate, achieving promising initial results on the ARC-AGI-2 benchmark.…

546

274

203.0K

Alex Zhang Retweeted

Hao AI Lab@haoailab · Jun 30

🔥 Pokémon Red is becoming a go-to benchmark for testing advanced AIs such as Gemini. But is Pokémon Red really a good eval? We study this problem and identify three issues: 1️⃣ Navigation tasks are too hard. 2️⃣ Combat control is too simple. 3️⃣ Raising a strong Pokémon team is…

104

67.0K

Alex Zhang@a1zhang · Jun 29

post move-out plans will not include working on the GPU codegen model, sorry @m_sirovatka

822