GPU MODE
@GPU_MODE
Your favorite GPU community
Watch my talk about NATTEN on @GPU_MODE this Saturday at 3PM ET / noon PT. I'll go over all the exciting new features we shipped very recently, especially our Hopper and Blackwell FNA kernels, now speeding up video / world models by up to 2.6X e2e! youtube.com/watch?v=mF_H_J
👀
nvidia could do the most viral ai competition in history: start with 10,000 researchers and give each a free gpu to work on a public leaderboard but do rounds of elimination where the winners take the remaining hardware. the final winner gets all the gpus for a year.
I solved every single problem in the CUDA mode book. A quick thread summarizing this experience and what I learned 1/x
If you’re curious to learn more. Joe is talking to us at noon PST today
we've launched a Luminal kernel search demo! you can see the process Luminal goes through to find the fastest GPU kernels, searching through loop structures, algebraic rewrites, tiling patterns and more!
The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️
If you want to hack on your own GPU schedules instead of being stuck with whatever the compiler gives you then join us in 30 min!
I'm giving a talk at GPU mode tomorrow. Feel free to join the livestream: youtube.com/live/J58AdFTHp…
Announcing a new @GPU_MODE kernel writing competition: our first featuring both NVIDIA and AMD hardware! The first problem will be the Triangle Multiplication operator essential to the AlphaFold 🧬 models! It's a particularly tricky problem with no good public implementation!
the follow up to @karpathy neural networks: zero to hero course is being built. singularity systems: zero to hero builds pytorch1/2 clones from scratch, training gpt2. looking for hardcore hackers to join the core team. come join the work group in the @GPU_MODE discord.
kind of a surreal moment being on stage with Lisa Su as she announces & thanks us for the competition we built the past year building w/ @m_sirovatka @marksaroufim, Ben, & Erik (all in our free time :p) on @GPU_MODE has been genuinely incredible, can’t thank you guys enough ❤️
Yay :) I gave this talk on WebGPU for general purpose GPU computation last year @GPU_MODE youtube.com/watch?v=Ll5Sr1…
WebGPU enabled by default in Safari 26. Long time coming.
Been excited about this talk for a while, @SonglinYang4 on efficient architecture! Just started! youtube.com/watch?v=j4zJbr…
This is a write-up of the 2nd place entry in the FP8 matmul kernel competition for AMD GPUs. Very insightful: github.com/seb-v/amd_chal…
I will be giving a talk in @GPU_MODE tomorrow (May 31 12pm PST) about FastVideo/STA/VSA. Come if you're interested! youtube.com/watch?v=x44iGp…
I will be giving a talk in @GPU_MODE tomorrow (May 24 12pm PST) about Disaggregated Inference. Come if you're interested! youtube.com/live/uc6TnOszz…
This is has been an amazing collaboration between teams at @Stanford @metaai @GPU_MODE @PyTorch If you're interested in making GPU programming dramatically more accessible then join us! There's a lot more stuff we're cooking! gpu-mode.github.io/popcorn/
Meta just released KernelLLM 8B on Hugging Face ⚡ > On KernelBench-Triton Level 1, our 8B parameter model exceeds models such as GPT-4o and DeepSeek V3 in single-shot performance 🤯 > With multiple inferences, KernelLLM's performance outperforms DeepSeek R1
Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…
ICYMI @GPU_MODE at GTC brought together leading voices in machine learning systems for an evening of sharp talks and fresh perspectives. 🎥 youtu.be/mdDVkBeFy9A From KernelBench to Thunderkittens, see what’s next in ML systems with speakers from Stanford, NVIDIA, PyTorch,…
Livestream starting with @GPU_MODE! 🔥 youtube.com/live/yOMflrCRy…
🚨 Live tomorrow at 12 PM PT, join us on the @GPU_MODE livestream for a deep dive on Mojo, MAX, & GPU programming, including a new tile-based Mojo programming model and a look at how we surpass the performance of vendor libraries on key algorithms 👀: youtube.com/live/yOMflrCRy…
📣 Problem 2, the fused Mixture-of-Experts kernel 🍿 for MI300s, is now OPEN for the @AMD x @GPU_MODE $100k competition! Go compete now for huge cash prizes -- registration ends SOON! Good luck everyone!
Woah - We are now down to 183.429μs for FP8 GEMM on MI300X (We started at 890.743μs) on the leaderboard !!! Lets Go!!! gpumode.com/leaderboard/399
📢 ATTN: AI Developers Are you ready to optimize, accelerate, and compete? Join the AMD Developer Challenge 2025: Inference Sprint. Push inference performance to the limit on the AMD ROCm software platform with cloud-based AMD MI300X. 🏆 $100K grand prize + $50K in additional…