Matej Sirovatka
@m_sirovatka
MLE @HuggingFace
The biggest dataset of human written GPU Code all open-source? 👀 YES Please! We at @GPU_MODE have released around 40k 🚀 human written code samples spanning Triton, Hip and PyTorch and it's all open on the @huggingface Hub. Train the new GPT to make GPTs faster ⚡️ Link below ⬇️
Distributed training has its own dialect. I made a pocket dictionary so you don’t open 50 browser tabs every time a paper mentions “ZeRO-offload.” 49 terms, crisp definitions, diagrams where they actually help. Grab it, skim it, get back to training. distributedlexicon(.)com
We’re doing GPU code generation for stuff @GPU_MODE and we had issues scaling concurrency. Yesterday @simonguozirui told me to use @modal_labs for rollout eval. Hacked it together in like 30 minutes and concurrency is not an issue 🫡 @charles_irl any plans to get a cheaper plan…
GPU Mode all over the globe, this time in NYC with an amazing speaker list and a very cool hackathon track, courtesy of Jane Street! See you there 🫡
New @GPU_MODE x Jane Street 1-day GPU programming hackathon in-person in NYC! Talks by the wonderful @tri_dao, @soumithchintala, and other PyTorch folks! If you're at #ICML25 check out more information at the Jane Street both! Register by Aug 17: bit.ly/3TS0d9I?r=qr
hey so uhhh, where do I send my job application?
Here’s a wrap-up of all our meeting rooms :)
Come meet us and chat (and listen to my yap)
sadly won’t be at ICML but have 2 papers that you should check out! KernelBench which @simonguozirui will be presenting at the main conference ^_^ + the @GPU_MODE leaderboard’s OSS infra at the CODEML workshop (7/19) that @m_sirovatka will be giving an oral for! Lots of 🍿!!
Insane results by the @Kimi_Moonshot team…now to the Muon math trenches we go 🫡
🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…
A really cool work from the @PrimeIntellect team with releasing their dataset 🚀 And obviously, dataset is available on their @huggingface hub.
Releasing SYNTHETIC-2: our open dataset of 4m verified reasoning traces spanning a comprehensive set of complex RL tasks and verifiers. Created by hundreds of compute contributors across the globe via our pipeline parallel decentralized inference stack. primeintellect.ai/blog/synthetic…
This is quickly becoming the course with most cracked minds in industry 👀 (and me)
Considering the rapid growth in speakers and topics, we're undergoing a critical change. This c̶o̶u̶r̶s̶e̶ conference has some of the smartest minds who understand the issues of training at scale today. And we're going to help you understand them all! 1/5
MakoGenerate now supports custom problems, meaning you can generate #CUDA or #Triton kernels for any @PyTorch reference code you have! Lets walk through an example using @GPU_MODE's latest contest: Triangle Multiplicative Update (TriMul) module
My for you is full of the Soham guy (the one that worked at like all the startups at the same time), and to be fair, shouldn't YC hire him just to scout? Have we ever seen him apply to a not sucessful company (yes he applied to HF too)? I think that guy has a bright future ahead
Has your company achieved anything without Soham applying?
You can utilize our Gemma 3n multimodal and fine-tuning Kaggle notebook for any submission to the $150,000 challenge! The $10,000 is specifically for the Unsloth track - but you can submit it for the main track as well! Kaggle notebook: kaggle.com/code/danielhan…
We’ve teamed up with @GoogleDeepMind for a challenge with a $10,000 Unsloth prize! 🦥 Show off your best fine-tuned Gemma 3n model using Unsloth, optimized for an impactful task. The entire hackathon has $150,000 prizes to be won! Kaggle notebook: kaggle.com/code/danielhan…
Nothing helps more to force me into stuff, than to just anounce it without me saying final yes (for legal reasons this actually didn't happen). Anyways, drop me ideas on what you'd like to learn related to profiling/debugging 👀
BOOOM! transformers now has a baked-in http server w/ OpenAI spec compatible API Launch it with `transformers serve` and connect your favorite apps. Here I'm running @jandotai with local transformers and hot-swappable models. There is preliminary tool call support as well!
Wednesday, 11am. We're going to dive deep into @PyTorch hooks, the most common use cases, and learn all about them. I know many of you don't know how they work, so I expect some of you to come and learn 😉