Nouamane Tazi

@Nouamanetazi

ML Research Engineer @huggingface 🤗. Scale it 'til you make it 🇵🇸🕊

Paris, France

Joined May 2012

1KFollowing

2KFollowers

Pinned

Nouamane Tazi@Nouamanetazi · Jul 8

SmolLM3 is out! Proud to have led the distributed work on this one. 💪🏻 So many mishaps and stories to tell, stay tuned for more details soon.. 👀 Everything open-sourced as usual for you to reproduce your own LLM training confidently: huggingface.co/blog/smollm3 🤗

Nouamanetazi's tweet image. SmolLM3 is out! Proud to have led the distributed work on this one. 💪🏻

So many mishaps and stories to tell, stay tuned for more details soon.. 👀

Everything open-sourced as usual for you to reproduce your own LLM training confidently: huggingface.co/blog/smollm3 🤗

2.0K

Nouamane Tazi Retweeted

elie@eliebakouch · Jul 21

We've just release 100+ intermediate checkpoints and our training logs from SmolLM3-3B training. We hope this can be useful to the researcher working on mech interpret, training dynamics, RL and other topics :) Training logs: -> Usual training loss (the gap in the loss are due…

394

192

31.0K

Nouamane Tazi Retweeted

elie@eliebakouch · Jul 8

Super excited to share SmolLM3, a new strong 3B model. SmolLM3 is fully open, we share the recipe, the dataset, the training codebase and much more! > Train on 11T token on 384 H100 for 220k GPU hours > Support long context up to 128k thanks to NoPE and intra document masking >…

137

834

475

114.0K

Nouamane Tazi Retweeted

Loubna Ben Allal@LoubnaBenAllal1 · Jul 8

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3

204

1.0K

524

265.0K

Nouamane Tazi@Nouamanetazi · Jun 25

It's out finallyyyy👌🏻

GGradio@Gradio · Jun 25

Today, we are excited to launch 𝑇𝑟𝑎𝑐𝑘𝑖𝑜, a lightweight experiment tracking and visualization library — written in <1,000 lines of Python — that is completely open-source, 100% free to use, locally or hosted.

304

Nouamane Tazi@Nouamanetazi · May 16

This is the *real* impact imo. Always a pleasure to hear such feedbacks. 🤗 Im also very excited abt scaling RL workloads...

PPrithviraj (Raj) Ammanabrolu@rajammanabrolu · May 15

I like the Ultra Scale Playbook from @huggingface and give it to my MS/first year PhD students to read as a prereq huggingface.co/spaces/nanotro… Is there an "RLSys" version of this on scaling RL+LLM training? If not + there's OSS community interest, I'll prob write one?

716

Nouamane Tazi Retweeted

younes@yb2698 · May 15

Announcing - Falcon-Edge – a series of powerful, universal and fine-tunable Bitnet models for everyone! We also release a Python fine-tuning toolkit library - `onebitllms`- specialized for Bitnet models. Announcement blogpost: falcon-lm.github.io/blog/falcon-ed…

227

28.0K

Nouamane Tazi Retweeted

Nathan@nathanhabib1011 · May 6

🔥 Evaluating LLMs? You need Lighteval — the fastest, most flexible toolkit for benchmarking models, built by @huggingface Now with: ✅ Plug & play custom model inference (evaluate any backend) 📈 Tasks like AIME, GPQA:diamond, SimpleQA, and hundreds more Details below 🧵👇

32.0K

Nouamane Tazi Retweeted

Jason Stillerman@JasonStillerma1 · May 7

Can you beat Qwen3 in a race across Wikipedia? 🏁 Go head-to-head with Qwen, Gemma, and DeepSeek as you race from Pokémon → Jennifer Aniston → anywhere you like. 🧵

255

185

25.0K

Nouamane Tazi Retweeted

m_ric@AymericRoucher · May 6

We're launching Computer Use in smolagents! 🥳 -> As vision models become more capable, they become able to power complex agentic workflows. Especially Qwen-VL models, that support built-in grounding, i.e. ability to locate any element in an image by its coordinates, thus to…

433

302

43.0K

Nouamane Tazi Retweeted

Ask Perplexity@AskPerplexity · Apr 22

Hey there! The Huggingface Ultra Scale Playbook is a detailed open-source guide published by Hugging Face. It explains the methods and tech involved in efficiently training large language models (LLMs) across many GPUs, often called GPU clusters. The playbook covers topics like…

241

Nouamane Tazi Retweeted

Toby Kim@_doyeob_ · Apr 21

DeepMind’s How to Scale and HuggingFace’s Ultra-Scale Playbook were super helpful. If you are interested in training large models, go read them now!

438

472

39.0K

Nouamane Tazi Retweeted

Perplexity Developers@PPLXDevs · Apr 2

We built custom sparse all-to-all kernels on NVSHMEM that split operations into send/receive components, implement minimal synchronization, and support GPU-initiated communication. This enables efficient Expert Parallel inference on NVLink and CX-7 and is EFA-compatible.

1.0K

Nouamane Tazi@Nouamanetazi · Mar 27

the only way i'm posting grad_norm plot from now on

eelie@eliebakouch · Mar 24

I'm using muon, and my grad norm is randomly forming an M shape, wtf

1.0K

60.0K

Nouamane Tazi@Nouamanetazi · Mar 21

Lecture 16: Parallelism and Scaling youtu.be/Mpg1YJfAEH0 - Basics of training on one device - Parallelization on multiple devices (e.g., data, tensor, pipeline parallel) - Combining and comparing strategies

SSean Welleck@wellecks · Jan 15

Excited to teach Advanced NLP at CMU this semester! Slides are on the course page as the course proceeds: cmu-l3.github.io/anlp-spring202… Lectures will be uploaded to Youtube: youtube.com/playlist?list=…

661

592

68.0K

Nouamane Tazi Retweeted

Arthur Zucker@art_zucker · Mar 19

Small util merged in transformers, open to contribution to extend it to all models! For now I tested `gemma3`, `gemma2`, `paligemma` and `mistral`! Curious to see some of the more special ones 👀 (mllama? Qwen-Audio? Whisper? Qwen-VL?)

418

234

36.0K

Nouamane Tazi@Nouamanetazi · Mar 15

👀

JJohannes Hagemann@johannes_hage · Mar 15

.@Thom_Wolf on the Boom project training a 70-100B parameter model decentralized

352

Nouamane Tazi@Nouamanetazi · Mar 14

The template behind The Ultra-Scale Playbook is out ✨ Open-sourcing everything ftw 🔥

LLeandro von Werra@lvwerra · Mar 13

Introducing the @distillpub Blog Template on the Hub! Can we bring back the good old distill.pub days with the super educational and well explained posts? We used this template for the FineWeb and Ultra Scale Playbook blog posts and want you to write similar blogs!…

345