Luis

@lusxvr

ML Research @huggingface | CS @ TUM

Joined January 2015

227Following

735Followers

Pinned

Luis@lusxvr · May 6

Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code. Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀 Even in a FREE Google Colab,…

lusxvr's tweet image. Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code.
Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀
Even in a FREE Google Colab,…

148

920

730

87.0K

Luis Retweeted

Andi Marafioti@andimarafioti · Jul 23

Many VLMs claim to process hours of video. But can they follow the story?🤔 Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

227

123

33.0K

Luis Retweeted

Francesco Capuano@_fracapuano · Jul 10

Today, we're releasing an open-source async inference stack for all models currently hosted on @huggingface, powering the world's cutest robots, built with love by the team at @LeRobotHF Details in 🧵

190

25.0K

Luis Retweeted

Thomas Wolf@Thom_Wolf · Jul 9

Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics Our first robot: Reachy Mini A dream come true: cute and low priced, hackable yet easy to use, powered by open-source and the infinite community. Tiny price, small size, huge…

234

514

3.0K

2.0K

1.2M

Luis Retweeted

Thomas Wolf@Thom_Wolf · Jul 8

We’re releasing the top 3B model out there SOTA performances It has dual mode reasoning (with or without think) Extended long context up to 128k And it’s multilingual with strong support for en, fr, es, de, it, pt What do you need more? Oh yes we’re also open-sourcing all…

481

192

38.0K

Luis Retweeted

Aritra Roy Gosthipaty@ariG23498 · Jul 8

"Why is the training so slow?" We figure out that starving the model from data, or providing it with padding tokens leads to training delays. We publish a write up which talks about data efficiency, and how we apply them to nanoVLM. Spoiler: We use knapsack algorithm. 🧵⤵️

239

189

14.0K

Luis Retweeted

Leandro von Werra@lvwerra · Jul 3

Remarkable progress of the Hugging Face science team in 2025: Open-R1, smolagents, SmolVLM2, Ultra-Scale Playbook, OlympicCoder, Open Computer Agent, Reachy Mini, SmolVLA, LeRobot Hackathon and many more... A summary of the projects we released so far this year🧶

167

30.0K

Luis Retweeted

Arthur Bresnu@Arthur_Bresnu · Jul 1

‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = 🔥 hybrid search performance! Details in 🧵

119

7.0K

Luis Retweeted

Andi Marafioti@andimarafioti · Jul 3

Can AI visualize solutions? 🧠👁️ Humans sketch things out in their minds to solve problems. What if Vision-Language Models could do something similar, not with full images, but with internal “mental sketches”? A new paper explores just that. Let's unpack it!

190

104

12.0K

Luis@lusxvr · Jul 3

Your training pipeline is as fast as the data pipeline. We (/w @andimarafioti @lusxvr) are writing a blog post on efficient multimodal data pipeline (images + text). This will be based on the latest addition to the nanovlm repository. Keep an eye out. x.com/andimarafioti/…

AAndi Marafioti@andimarafioti · Jun 24

🚀 Big nanoVLM Update: Train 4 models for the price of 1! We just introduced efficient multimodal data packing, making training 4x faster. Let me show you how 👇

1.0K