Sylvain Gugger
@GuggerSylvain
Machine Learning at Jane Street. Previously at @huggingface and @fastdotai Co-author of http://github.com/fastai/fastbook He/him
One year and half after starting the first draft of the first chapter, look what arrived in the mail!

Very excited to collaborate with Mark on this!
On Sep 6 in NYC, this won't be your typical hackathon where you do your own thing in a corner and then present at the of the day. You'll deploy real models to the market, trades will happen, chaos should be expected. The fastest model is great but time to market matters more.
The new transformers release comes w/ a surprise: kernels support ⚡️ It integrates deeply with precompiled kernels on the HF Hub. - opt-in, automatic kernels for your hardware and software - kernels like FA2/3 w/o compilation - community-built kernels, for inference & training
🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge…
🚨🔥 CUTLASS 4.0 is released 🔥🚨 pip install nvidia-cutlass-dsl 4.0 marks a major shift for CUTLASS: towards native GPU programming in Python slidehelloworld.png docs.nvidia.com/cutlass/media/…
Speculative Decoding before: limited choices, the draft model must have the same tokenizer 😬 Speculative Decoding now: unlimited choices, ANY draft model can be used and better speedup opportunities 😎 The folks at Intel have been cooking, and Speculative Decoding (with…
(1/7) Inspired by DeepSeek's FlashMLA, we're releasing ThunderMLA—a fused megakernel optimized for variable-prompt decoding! ⚡️🐱ThunderMLA is up to 35% faster than FlashMLA and just 400 LoC. Blog: bit.ly/4kubAAK With @AaryanSinghal4, @realDanFu, and @hazyresearch!
Write a fast kernel and run it on Discord. See how you compare against the best! If you're familiar with Leetcode, Kaggle or Codeforces then this should feel right at home
🚀 Excited to release *THE* Ultra-Scale Playbook - a comprehensive guide on training LLMs from 1 to 1000s of GPUs!
This is huge, huge, huge - DeepSpeed is now a community-owned project as it's now a part of the Linux Foundation. Committer access should be possible now. Thank you, @MSFTResearch for breathing life into this very important to the ML community scalability framework and now…
🚀 Excited to introduce DeepSpeed, a deep learning optimization library from @Microsoft! It simplifies distributed training and inference, making AI scaling more efficient and cost-effective. Learn more 👉 hubs.la/Q0351DJC0 #DeepSpeed #AI #OpenSource #LFAIData
TIL Jane Street have an eng podcast Most recent episode is with @GuggerSylvain on training & ML infra
They have a nice blog about it signalsandthreads.com
We had an awesome talk at Jane Street from the amazing @cHHillee on scaling ML systems to and I just realized the recording is now online: youtu.be/139UPjoq7Kw?si…
Jane Street tech talks have always been super awesome. So I'm quite excited to be visiting Jane Street on Monday to give a talk on building ML systems for a trillion trillion FLOPs :) I'll talk about a bunch of fun things, including cool GPU optimizations, how I think about…
This is. such a fun talk from @ixyene! All about system jitter and how to hunt it down. Also, it features a cameo appearance from magic-trace.org, my favorite profiling tool that no one has heard of. youtu.be/I_TtMk5z0O0?si…
PyTorch 2.5 is here 🔥 We are excited to announce the release of #PyTorch 2.5, featuring a new CuDNN backend for SDPA, regional compilation of torch.compile, & TorchInductor CPP backend performance speedup Read more in our blog: hubs.la/Q02TRs9p0