Orr Zohar

@orr_zohar

@nvidia • @Stanford • @KnightHennessy scholar • Researching large multimodal models

Joined May 2023

191Following

408Followers

Pinned

Orr Zohar@orr_zohar · Jul 23

🧵 Introducing TimeScope, an open-source benchmark rigorously evaluating the true “temporal context window” of video-language models on videos ranging from 1 minute to 8 hours. #AI #MachineLearning

2.0K

Orr Zohar@orr_zohar · Jul 23

🧠 How can we truly test long-context video understanding in video-LMMs? ⏱️ TimeScope benchmarks models from 1 min to 8 hours using “needle-in-a-haystack” probes. 🚀 Gemini 2.5-Pro leads the pack—but even it struggles as context length grows. Long-range memory is still a…

OOrr Zohar@orr_zohar · Jul 23

690

Orr Zohar Retweeted

Lei Li@_TobiasLee · Jul 22

Thrilled to announce our MiMo-VL series hit 100K downloads on HuggingFace last month! 🚀🚀 Incredible to see the community's enthusiasm for our VLMs. More exciting updates coming soon! 😜 huggingface.co/XiaomiMiMo/MiM…

7.0K

Orr Zohar Retweeted

merve@mervenoyann · Jul 23

timescope: testing if large models understand long videos or they just claim to do so 🤠 they randomly insert needles (short videos/static images) in long videos and ask questions about the needle itself 🤯 Gemini seems to be the best! very cool work by @orr_zohar et al 👏

117

7.0K

Orr Zohar@orr_zohar · Jul 8

SmolVLM has been accepted to @COLM_conf 2025 🥳! See you in Montreal!

AAndi Marafioti@andimarafioti · Jan 23

Introducing the smollest VLMs yet! 🤏 SmolVLM (256M & 500M) runs on <1GB GPU memory. Fine-tune it on your laptop and run it on your toaster. 🚀 Even the 256M model outperforms our Idefics 80B (Aug '23). How small can we go? 👀

219

17.0K

Orr Zohar Retweeted

Luis@lusxvr · Jul 2

Today, we are open-sourcing our pipeline to deduplicate large-scale image datasets. On one GPU, we can deduplicate 10k images against 1M indexed test images in ~60 seconds. But how?

103

817

779

75.0K

Orr Zohar Retweeted

Francesco Capuano@_fracapuano · Jun 3

Robotics models are increasingly bulky and difficult to run directly on robots. With @RemiCadene and the team @LeRobotHF and @huggingface we’re changing that. Introducing SmolVLA, a sub-500M VLA designed for efficient training and inference. A thread 🧵

190

103

16.0K

Orr Zohar Retweeted

Miquel Farré@micuelll · May 13

WE ARE COOKING!! I’m looking for a creative engineer to join the ride 🤩 If that’s you, send me a message 🚀 You should be someone who learns tools fast, builds scrappy hacks when needed, and focuses on what works. You might be working in the space of media, image/video…

3.0K

Orr Zohar Retweeted

Aritra Roy Gosthipaty@ariG23498 · May 7

🥹❤️

154

5.0K

Orr Zohar Retweeted

Thomas Wolf@Thom_Wolf · May 6

New open-source drop from the HF team - nanoVLM A super tight codebase to learn/train VLM with good performances - inspired by @karpathy 's NanoGPT 750 lines of pytorch code. Training a 222M parameters nanoVLM for 6 hours on a single H100 reaches 35.3% on MMStar, matching the…

149

10.0K

Orr Zohar Retweeted

Luis@lusxvr · May 6

Today, we are open-sourcing nanoVLM, a pure pytorch library to train a Vision-Language Model from scratch in 750 lines of code. Training on one H100 for 6h, we get 35.3% on MMStar, matching SmolVLM-256M which was trained with 100x more GPU hours. 👀 Even in a FREE Google Colab,…

148

920

730

87.0K

Orr Zohar Retweeted

Andi Marafioti@andimarafioti · May 7

Alert alert, we got our first external contribution to the nanoVLM project! Thank you, @not_so_lain !

2.0K

Orr Zohar Retweeted

Vaibhav (VB) Srivastav@reach_vb · May 6

BOOOM! Learn VLMs from inside out in < 1000 lines of pure PyTorch code! 🔥 github.com/huggingface/na…

404

281

37.0K

Orr Zohar@orr_zohar · Apr 25

Excited to present Video-STaR at #ICLR2025’s poster session tomorrow! 🗓️ Visit me at Poster 91, 10:00 AM–12:30 PM 🚀 Dive into our work on advancing video reasoning using self-training:

OOrr Zohar@orr_zohar · Jul 10, 2024

🚀 Can self-training improve general LVLM performance? 🏎️ How can you adapt your LVLMs to new and diverse applications? 📢 Happy to announce Video-STaR, a self-training approach to utilize any supervision for video instruction tuning! 🧵👇

4.0K