Andi Marafioti
@andimarafioti
cooking multimodal models @huggingface
🚀We just dropped SmolDocling: a 256M open-source vision LM for complete document OCR!📄✨ It's lightning fast, process a page in 0.35 sec on consumer GPU using < 500MB VRAM⚡ SOTA in document conversion, beating every competing model we tested up to 27x larger🤯 But how? 🧶⬇️

Lovely PyTorch bug last night: if you're using multiple workers in a DataLoader, each one must yield samples — you can't have some just doing prep work. Learned that the hard way 😂
It’s time for the American AI community to wake up, drop the "open is not safe" bullshit, and return to its roots: open science and open-source AI, powered by an unmatched community of frontier labs, big tech, startups, universities, and non‑profits. If we don’t, we’ll be forced…
Love to see this from @WhiteHouse!
I’m so psyched that Nico is joining Hugging Face! I met him at @uphillconf where he gave the closing talk and completely blew me away. Follow him to see how mesmerizing WebML can be!
Beyond happy to announce that I'm joining 🤗 @huggingface as a #MachineLearningEngineer focused on #WebML!
Big congrats to cohere labs, they are doing amazing work !
Sometimes it is important to take a moment and celebrate -- we achieved all of this in 3 years. Pretty incredible impact from @Cohere_Labs 🔥
I did not notice this until just now. Thank you @andimarafioti for the recommendation! Very glad that even though Eagle 2 is not our latest work, people still find it very useful.
The Eagle 2 paper from Nvidia is such a goldmine.
Introducing ColQwen-Omni, a 3B omnimodal retriever that extends the ColPali concept of multimodal retrieval with late interaction to audio chunks and short videos, with no performance degradation on visual document retrieval wrt our best models! (1/N)
I spoke to Hugging Face cofounder @Thom_Wolf about the Reachy Mini, and the company's bet on cute, desktop robotic devices to bring open source AI models into people's homes. TBH, I think the Reachy Mini is one of a few AI devices people I know are really excited about.
HF's got a couple papers at COLM 2025 covering all stages of open model life! 📚Data: FineWeb2 (led by @HKydlicek and @gui_penedo) 🧱Model creation: SmolLM2 (led by @LoubnaBenAllal1) and SmolVLM (led by @andimarafioti ) 🧪Evals: YourBench (led by @sumukx) Good job team! 🎉
We're open-sourcing "The Amazing Hand", an eight-degree of freedom humanoid robot hand compatible with @lerobot that can be 3-D printed at home for less than $250 ✌️✌️✌️ Given the success of Reachy Mini (2,000+ robots sold in a few days), we won't have the bandwidth to…
1T parameters, open-weights, just released on @huggingface!
Introducing SmolTalk2: the dataset behind SmolLM3's dual reasoning. - mid-training → 5M samples - SFT data → 3M samples - preferences for APO → 500k samples It combines open datasets with new ones curated for strong think and no_think performance. hf.co/datasets/Huggi…
Happy to announce 🤗Datasets 4 ! we've added the most requested feature 👀 Introducing streaming data pipelines for Hugging Face Datasets ✨ With support for large, multimodal datasets in any standard file format, and with num_proc= for speed⚡
Opening orders for Reachy Mini today, our open-source desktop robot for AI builders, starting at $299! Fully integrated with @LeRobotHF & @huggingface for the whole community to build AI apps for it (like this dancing one). We'll probably ship a first batch of a hundred this…
FineWeb2 🥂 has been accepted to @COLM_conf See you in October 🇨🇦
We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and exhaustive) multilingual work, we introduce a concept that can also improve English performance: deduplication-based upsampling, which we call rehydration.
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge…