Luis
@lusxvr
ML Research @huggingface | CS @ TUM
Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code. Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀 Even in a FREE Google Colab,…

Many VLMs claim to process hours of video. But can they follow the story?🤔 Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳
Today, we're releasing an open-source async inference stack for all models currently hosted on @huggingface, powering the world's cutest robots, built with love by the team at @LeRobotHF Details in 🧵
Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge…
We’re releasing the top 3B model out there SOTA performances It has dual mode reasoning (with or without think) Extended long context up to 128k And it’s multilingual with strong support for en, fr, es, de, it, pt What do you need more? Oh yes we’re also open-sourcing all…
"Why is the training so slow?" We figure out that starving the model from data, or providing it with padding tokens leads to training delays. We publish a write up which talks about data efficiency, and how we apply them to nanoVLM. Spoiler: We use knapsack algorithm. 🧵⤵️
Remarkable progress of the Hugging Face science team in 2025: Open-R1, smolagents, SmolVLM2, Ultra-Scale Playbook, OlympicCoder, Open Computer Agent, Reachy Mini, SmolVLA, LeRobot Hackathon and many more... A summary of the projects we released so far this year🧶
‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = 🔥 hybrid search performance! Details in 🧵
Can AI visualize solutions? 🧠👁️ Humans sketch things out in their minds to solve problems. What if Vision-Language Models could do something similar, not with full images, but with internal “mental sketches”? A new paper explores just that. Let's unpack it!
Your training pipeline is as fast as the data pipeline. We (/w @andimarafioti @lusxvr) are writing a blog post on efficient multimodal data pipeline (images + text). This will be based on the latest addition to the nanovlm repository. Keep an eye out. x.com/andimarafioti/…
🚀 Big nanoVLM Update: Train 4 models for the price of 1! We just introduced efficient multimodal data packing, making training 4x faster. Let me show you how 👇