Anton Lozhkov

@anton_lozhkov

Open-sourcing Language Models @huggingface ✨

Joined January 2015

324Following

2KFollowers

Pinned

Anton Lozhkov@anton_lozhkov · Dec 19

Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens! Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH. Here’s a breakdown 🧵

anton_lozhkov's tweet image. Introducing 📐FineMath: the best open math pre-training dataset with 50B+ tokens!

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

Here’s a breakdown 🧵

361

204

50.0K

Pinned

Anton Lozhkov Retweeted

elie@eliebakouch · Mar 12

Gemma3 technical report detailed analysis 💎 1) Architecture choices: > No more softcaping, replace by QK-Norm > Both Pre AND Post Norm > Wider MLP than Qwen2.5, ~ same depth > SWA with 5:1 and 1024 (very small and cool ablation on the paper!) > No MLA to save KV cache, SWA do…

481

282

41.0K

Anton Lozhkov Retweeted

Carlos Miguel Patiño@cmpatino_ · Jul 11

We're releasing SmolTalk2: the dataset we used to post-train SmolLM3-3B! Our model wouldn't be fully open-source without the dataset we used to train it, so we're including all our processed data with the details to replicate our post-training. huggingface.co/datasets/Huggi… (1/3)

125

7.0K

Anton Lozhkov Retweeted

ggml@ggml_org · Jul 8

github.com/ggml-org/llama…

12.0K

Anton Lozhkov@anton_lozhkov · Jul 8

You've asked and we delivered! SmolLM3 with looong context, reasoning and multiple languages 😍😍😍

LLoubna Ben Allal@LoubnaBenAllal1 · Jul 8

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3

537

Anton Lozhkov Retweeted

Leandro von Werra@lvwerra · Jul 3

Remarkable progress of the Hugging Face science team in 2025: Open-R1, smolagents, SmolVLM2, Ultra-Scale Playbook, OlympicCoder, Open Computer Agent, Reachy Mini, SmolVLA, LeRobot Hackathon and many more... A summary of the projects we released so far this year🧶

167

30.0K

Anton Lozhkov Retweeted

Guilherme Penedo@gui_penedo · Jun 27

We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.

427

237

74.0K

Anton Lozhkov@anton_lozhkov · Jun 5

NCCL sending the loss value from the last pipeline parallel stage back to rank 0 so the user can print it

PPablo A. Peniche@PabloPeniche · Jun 5

322

19.0K

Anton Lozhkov Retweeted

Georgia Channing@cgeorgiaw · May 7

It's all well and good that OpenAI acquired Windsurf for $3 billion—probably for their massive repository of source code data. But have you heard of BigCode? 🧵 Here's why the BigCode matters:

1.0K

Anton Lozhkov Retweeted

Hugo Larcher@hugoch · Mar 28

🧠 LLM inference isn’t just about latency — it’s about consistency under load. Different workloads, configs, and hardware = very different real-world performances. At Hugging Face 🤗 we built inference-benchmarker — a simple tool to stress-test LLM inference servers. 🧵 (1/2)

3.0K

Anton Lozhkov Retweeted

Thomas Wolf@Thom_Wolf · Mar 20

Generating high-quality code is the basis for code assistants but also for almost all Agentic-AI approaches That's why I'm very excited to see 2025 starting to be the year of high-performance code generation in *open-source* LLMs After our latest release 'OlympicCoder' beat…

188

122

14.0K

Anton Lozhkov Retweeted

Loubna Ben Allal@LoubnaBenAllal1 · Mar 20

Build your code assistant at home with our new code pretraining datasets: 📚 Stack-Edu – 125B tokens of educational code across 15 programming languages, aka the FineWeb-Edu of code 🐛 GitHub Issues – 11B tokens of discussions from GitHub issues 📊 Kaggle Notebooks – 2B tokens…

229

135

14.0K

Anton Lozhkov Retweeted

Lewis Tunstall@_lewtun · Mar 17

It's pretty outrageous that a 250M parameter model can correctly convert screenshots of quantum field theory equations to LaTeX 🤯 Wish I had this when I was a student!

663

310

38.0K

Anton Lozhkov Retweeted

Leandro von Werra@lvwerra · Mar 11

Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.

214

2.0K

957

159.0K

Anton Lozhkov Retweeted

Quentin Gallouédec @ ICML@QGallouedec · Mar 11

Have we found a way to beat DeepSeek-R1? 💣 Check hf.co/blog/open-r1/u… 🧵[0/10] Let's dive into our latest progress in Open R1.

267

174

32.0K

Anton Lozhkov Retweeted

Loubna Ben Allal@LoubnaBenAllal1 · Mar 7

🚀 New dataset drop: DCLM-Edu We filtered DCLM using FineWeb-Edu’s classifier to create a cleaner dataset optimized for smol models (like SmolLM2 135M/360M). Why? Small models are sensitive to noise and can benefit from heavily curated data.

179

11.0K

Anton Lozhkov Retweeted

Quentin Lhoest 🤗@lhoestq · Feb 25

✨NEW in @huggingface Datasets v3.3 🔥 Process datasets using async functions in .map() ! Crazy useful to use AI models like R1 from @deepseek_ai ...maybe to fine-tune smaller models later ? Screenshot of the full colab in comments

16.0K

Anton Lozhkov Retweeted

Loubna Ben Allal@LoubnaBenAllal1 · Feb 19

Over 1M downloads for SmolLM2 360M the past month 🚀 Curious what are your main use cases if you're using the model

105

16.0K

Anton Lozhkov@anton_lozhkov · Feb 19

Just get PTO on Friday and read this instead. > Reading time: 2-4 days.

NNouamane Tazi@Nouamanetazi · Feb 19

🚀 Excited to release *THE* Ultra-Scale Playbook - a comprehensive guide on training LLMs from 1 to 1000s of GPUs!

293