Randall Balestriero

@randall_balestr

AI Researcher: From theory to practice (and back) Postdoc @MetaAI with @ylecun PhD @RiceUniversity with @rbaraniuk Masters @ENS_Ulm @Paris_Sorbonne

USA

Joined April 2020

217Following

4KFollowers

Pinned

Randall Balestriero@randall_balestr · Jul 8

Impressed by DINOv2 perf. but don't want to spend too much $$$ on compute and wait for days to pretrain on your own data? Say no more! Data augmentation curriculum speeds up SSL pretraining (as it did for generative and supervised learning) -> FastDINOv2! arxiv.org/abs/2507.03779

randall_balestr's tweet image. Impressed by DINOv2 perf. but don't want to spend too much $$$ on compute and wait for days to pretrain on your own data? Say no more! Data augmentation curriculum speeds up SSL pretraining (as it did for generative and supervised learning) -&gt; FastDINOv2!
arxiv.org/abs/2507.03779

188

122

14.0K

Randall Balestriero Retweeted

Patrik Reizinger@rpatrik96 · Jul 12

I am heading to @icmlconf to present our position paper with @randall_balestr @klindt_david @wielandbr on what we believe are the important next steps to advance SSL. It's not either theory or practice, it's both. We as a community need a better discussion.

3.0K

Randall Balestriero Retweeted

Marcin Przewięźlikowski@pszwnzl · Jun 26

Our paper "Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations" has been accepted to @ICCVConference! 🧵 TL;DR: Masked image models (like MAE) underperform not just because of weak features, but because they aggregate them poorly. [1/7]

145

105

11.0K

Randall Balestriero@randall_balestr · Jun 16

Language/tokens provide a compressed space that is aligned with current LLM evaluation tasks (see our Next Token Perception Score arxiv.org/abs/2505.17169) while pixels are raw unfiltered sensing of the world known to be misaligned with perception tasks (see our paper with…

SSergey Levine@svlevine · Jun 8

I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/language-mod…

9.0K