TimDarcet

@TimDarcet

PhD student, building big vision models @ INRIA & FAIR (Meta)

Joined March 2021

748Following

4KFollowers

Pinned

TimDarcet@TimDarcet · Apr 21, 2023

1/ This week we released DINOv2: a series of general vision encoders pretrained without supervision. Good out-of-the-box performance on a variety of domains, matching or surpassing other publicly available encoders.

113

711

218

120.0K

Pinned

TimDarcet Retweeted

Piotr Bojanowski@p_bojanowski · Jul 8

Why does Meta open-source its models? I talked about it with @kawecki_maciej looking at Dino, our computer vision model with applications in forest mapping, medical research, agriculture and more. Open-source boosts AI access, transparency, and safety. youtube.com/watch?v=eNGafi…

9.0K

TimDarcet Retweeted

Ahmad Mustafa Anis@AhmadMustafaAn1 · Jul 6

~400 people have joined us on Sunday at @Cohere_Labs Open Science Community ML Summer School. @TimDarcet as always, delivering a super amazing talk on Scaling Self Supervised Learning (SSL, Dinov2, Masked Image Modeling, CAPI) Super interesting session.

174

9.0K

TimDarcet@TimDarcet · Jul 4

Hey I'm a doctor now, neat

AAndrei Bursuc@abursuc · Jul 2

🚨New doctor in the house!🚨 Congrats to @TimDarcet for his tremendous work (DINOv2, registers, CAPI) & successful PhD defense followed by ~2 hrs of questions -- he's got stamina! Congrats to his incredible team of advisors from Inria & Meta: @julienmairal @p_bojanowski M. Oquab

201

11.0K

TimDarcet@TimDarcet · Jun 30

I already know who Jiahui is you don’t have to tell me

�🦋/acc @ 🌲🎗️@selini0 · Jun 30

On the left is Ronaldo, Real Madrid spent 80M $ to sign him from Man United On the right is Jiahui Yu, Meta paid 100M $ to sign him from OpenAI

199

14.0K

TimDarcet@TimDarcet · Jun 22

In case there is any ambiguity: DINOv2 is 100% a product of dumb hill-climbing on ImageNet-1k knn accuracy (and linear too) Overfitting an eval can be bad. But sometimes the reward signal is reliable, and leads to truly good models. It's about finding a balance

ssamsja@samsja19 · Jun 19

Oh I am a big fan of self supervised learning. Also ssl has never been benchmark maxing on imagenet afaik. I am mainly complaining about the supervised classification imagenet hill climb

200

26.0K

TimDarcet@TimDarcet · Jun 22

FFS @huggingface please stop doing that it makes you look like pretentious assholes

TThien Tran@gaunernst · Jun 22

HF stealing all generic (pypi) package names

2.0K

TimDarcet@TimDarcet · Jun 14

Great summary of dino.txt by Fede! Drop by the poster if you're at CVPR! 📅 Sunday, June 15 🕥 10:30 - 12:30 📍 Poster 370

FFederico Baldassarre@BaldassarreFe · Jun 14

DINOv2 meets text at #CVPR 2025! Why choose between high-quality DINO features and CLIP-style vision-language alignment? Pick both with dino.txt 🦖📖 We align frozen DINOv2 features with text captions, obtaining both image-level and patch-level alignment at a minimal cost. [1/N]

2.0K

TimDarcet Retweeted

Nick Jiang@nickhjiang · Jun 10

Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵

134

997

826

128.0K

TimDarcet Retweeted

François Fleuret@francoisfleuret · May 14

Two things teach intellectual humility: people smarter than you, and maths. Doing math with people smarter than you is sort of a bit too much.

167

2.0K

232

57.0K

TimDarcet@TimDarcet · May 14

So the reason I was asking about this is because the squared L2 has the very pleasant property of reducing to "just push away from the avg" and that would eliminate all batch size issues (you an use an EMA avg) It's basically what DINO does, w/ softmax+CE loss instead of L2

TTimDarcet@TimDarcet · May 11

Is there a good reason we use softmax losses in contrastive learning, instead of just doing MSE? ie L = ||xi-xi'||² - lambda sum_k ||xi-xk'||² I'd guess the optimization dynamics are maybe friendlier, but does anyone have a good pointer? Both for CLIP and SSL btw

3.0K

TimDarcet@TimDarcet · May 11

488

442

109.0K

TimDarcet@TimDarcet · Apr 29

Oh these plots are great too They fit with my observations on norms of different things

SSeunghyun Seo@SeunghyunSEO7 · Apr 29

arxiv.org/abs/2410.10781… oh, it's interesting that attn sink is caused by angular difference rather than its scale.

2.0K

TimDarcet@TimDarcet · Apr 28

Summary of "Massive activations in LLMs": - "artifact" tokens are in all transformers, ViTs and LLMs - their weirdness is ~only on 1 channel - they are the same as the quantization outliers - their purpose is *not* global information - there's a fix simpler than registers

GGabriele Berton@gabriberton · Apr 28

Could you give a summary for all the lazy readers who won't open the link?

117

66.0K

TimDarcet@TimDarcet · Apr 28

I also view layernorm as hyperplane proj + hypersphere proj Hyperplane proj makes no sense, hence we do RMSnorm now Although don't forget the epsilon. We project onto the hyper*ball* actually

ddrummatick@drummatick · Apr 27

Absolutely gold article. Changed the way I see Layer Norm

3.0K

TimDarcet Retweeted

Gabriele Berton@gabriberton · Apr 28

Ok there's a new paper in my top 3 favorites Vision transformers need registers Clear problem, elegant solution, well written, easy to understand, good results, limitations included. No fancy losses or layers. No equation (at all!) Here's a short summary: (1/4)

100

1.0K

107.0K

TimDarcet@TimDarcet · Apr 27

Rumor is 2>1 For clarifications ask Peano

aanton@abacaj · Apr 27

Rumor is r2 > r1

2.0K