GPU MODE

@GPU_MODE

Your favorite GPU community

Joined September 2024

9Following

3KFollowers

GPU MODE Retweeted

Ali Hassani@AliHassaniJr · Jul 24

Watch my talk about NATTEN on @GPU_MODE this Saturday at 3PM ET / noon PT. I'll go over all the exciting new features we shipped very recently, especially our Hopper and Blackwell FNA kernels, now speeding up video / world models by up to 2.6X e2e! youtube.com/watch?v=mF_H_J

2.0K

GPU MODE@GPU_MODE · Jul 20

👀

CCristian Garcia@cgarciae88 · Jul 19

nvidia could do the most viral ai competition in history: start with 10,000 researchers and give each a free gpu to work on a public leaderboard but do rounds of elimination where the winners take the remaining hardware. the final winner gets all the gpus for a year.

3.0K

GPU MODE Retweeted

Piotr Mazurek@tugot17 · Jul 12

I solved every single problem in the CUDA mode book. A quick thread summarizing this experience and what I learned 1/x

241

2.0K

4.0K

267.0K

GPU MODE@GPU_MODE · Jul 12

If you’re curious to learn more. Joe is talking to us at noon PST today

JJoe Fioti@joefioti · Jul 12

we've launched a Luminal kernel search demo! you can see the process Luminal goes through to find the fastest GPU kernels, searching through loop structures, algebraic rewrites, tiling patterns and more!

3.0K

GPU MODE Retweeted

Matej Sirovatka@m_sirovatka · Jul 8

The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️

318

150

31.0K

GPU MODE@GPU_MODE · Jun 28

If you want to hack on your own GPU schedules instead of being stuck with whatever the compiler gives you then join us in 30 min!

YYuka Ikarashi@c20 · Jun 27

I'm giving a talk at GPU mode tomorrow. Feel free to join the livestream: youtube.com/live/J58AdFTHp…

5.0K

GPU MODE Retweeted

Alex Zhang@a1zhang · Jun 24

Announcing a new @GPU_MODE kernel writing competition: our first featuring both NVIDIA and AMD hardware! The first problem will be the Triangle Multiplication operator essential to the AlphaFold 🧬 models! It's a particularly tricky problem with no good public implementation!

308

212

25.0K

GPU MODE Retweeted

j4orz@j4orz · Jun 11

the follow up to @karpathy neural networks: zero to hero course is being built. singularity systems: zero to hero builds pytorch1/2 clones from scratch, training gpt2. looking for hardcore hackers to join the core team. come join the work group in the @GPU_MODE discord.

15.0K

GPU MODE Retweeted

Alex Zhang@a1zhang · Jun 13

kind of a surreal moment being on stage with Lisa Su as she announces & thanks us for the competition we built the past year building w/ @m_sirovatka @marksaroufim, Ben, & Erik (all in our free time :p) on @GPU_MODE has been genuinely incredible, can’t thank you guys enough ❤️

130

21.0K

GPU MODE@GPU_MODE · Jun 10

Yay :) I gave this talk on WebGPU for general purpose GPU computation last year @GPU_MODE youtube.com/watch?v=Ll5Sr1…

DDaniel Hooper@DanielcHooper · Jun 9

WebGPU enabled by default in Safari 26. Long time coming.

4.0K

GPU MODE@GPU_MODE · Jun 7

Been excited about this talk for a while, @SonglinYang4 on efficient architecture! Just started! youtube.com/watch?v=j4zJbr…

189

20.0K

GPU MODE Retweeted

Tim Dettmers@Tim_Dettmers · Jun 6

This is a write-up of the 2nd place entry in the FP8 matmul kernel competition for AMD GPUs. Very insightful: github.com/seb-v/amd_chal…

190

161

17.0K

GPU MODE Retweeted

Perry Zhang@PY_Z001 · May 30

I will be giving a talk in @GPU_MODE tomorrow (May 31 12pm PST) about FastVideo/STA/VSA. Come if you're interested! youtube.com/watch?v=x44iGp…

111

6.0K

GPU MODE Retweeted

Junda Chen@Junda_Chen_ · May 23

I will be giving a talk in @GPU_MODE tomorrow (May 24 12pm PST) about Disaggregated Inference. Come if you're interested! youtube.com/live/uc6TnOszz…

9.0K

GPU MODE@GPU_MODE · May 19

This is has been an amazing collaboration between teams at @Stanford @metaai @GPU_MODE @PyTorch If you're interested in making GPU programming dramatically more accessible then join us! There's a lot more stuff we're cooking! gpu-mode.github.io/popcorn/

VVaibhav (VB) Srivastav@reach_vb · May 19

Meta just released KernelLLM 8B on Hugging Face ⚡ > On KernelBench-Triton Level 1, our 8B parameter model exceeds models such as GPT-4o and DeepSeek V3 in single-shot performance 🤯 > With multiple inferences, KernelLLM's performance outperforms DeepSeek R1

5.0K

GPU MODE Retweeted

Haicheng Wu@asdf1234_0 · May 16

Tomorrow (5/17), The CuTe creator, Cris Cecka, will teach CuTe himself on GPU Mode. youtube.com/watch?v=ufa4pm…

9.0K

GPU MODE Retweeted

NVIDIA AI Developer@NVIDIAAIDev · May 9

ICYMI @GPU_MODE at GTC brought together leading voices in machine learning systems for an evening of sharp talks and fresh perspectives. 🎥 youtu.be/mdDVkBeFy9A From KernelBench to Thunderkittens, see what’s next in ML systems with speakers from Stanford, NVIDIA, PyTorch,…

7.0K

GPU MODE@GPU_MODE · May 2

Livestream starting with @GPU_MODE! 🔥 youtube.com/live/yOMflrCRy…

MModular@Modular · May 1

🚨 Live tomorrow at 12 PM PT, join us on the @GPU_MODE livestream for a deep dive on Mojo, MAX, & GPU programming, including a new tile-based Mojo programming model and a look at how we surpass the performance of vendor libraries on key algorithms 👀: youtube.com/live/yOMflrCRy…

6.0K

GPU MODE Retweeted

Alex Zhang@a1zhang · Apr 29

📣 Problem 2, the fused Mixture-of-Experts kernel 🍿 for MI300s, is now OPEN for the @AMD x @GPU_MODE $100k competition! Go compete now for huge cash prizes -- registration ends SOON! Good luck everyone!

4.0K

GPU MODE@GPU_MODE · Apr 21

Woah - We are now down to 183.429μs for FP8 GEMM on MI300X (We started at 890.743μs) on the leaderboard !!! Lets Go!!! gpumode.com/leaderboard/399

AAMD@AMD · Apr 14

📢 ATTN: AI Developers Are you ready to optimize, accelerate, and compete? Join the AMD Developer Challenge 2025: Inference Sprint. Push inference performance to the limit on the AMD ROCm software platform with cloud-based AMD MI300X. 🏆 $100K grand prize + $50K in additional…

136

11.0K