Harry Thasarathan (@HThasarathan)

H

Check out this amazing work by @fenildoshi009 on shape holistic processing in vision models! 🍀

FFenil Doshi@fenildoshi009 · Jul 2

🧵 What if two images have the same local parts but represent different global shapes purely through part arrangement? Humans can spot the difference instantly! The question is can vision models do the same? 1/15

1

4

36

4

2.0K

Harry Thasarathan Retweeted

S

Simone Scardapane@s_scardapane · Jun 12

*Universal Sparse Autoencoders* by @HThasarathan @Napoolar @MatthewKowal9 @CSProfKGD They train a shared SAE latent space on several vision encoders at once, showing, e.g., how the same concept activates in different models. arxiv.org/abs/2502.03714

3

38

255

153

12.0K

Harry Thasarathan Retweeted

T

Thomas Fel@Napoolar · Jun 11

Around CVPR for the next 2 days—if you're into interpretability, SAEs, complexity, or just wanna know how cool @KempnerInst is, hit me up 👋

0

9

71

14

3.0K

Harry Thasarathan Retweeted

E

Ekdeep Singh@EkdeepL · Jun 6

🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11

5

63

386

415

38.0K

H

Harry Thasarathan@HThasarathan · May 27

i am once again asking more people to do vision model interpretability

GGoodfire@GoodfireAI · May 27

We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇

4

5

139

26

9.0K

H

Harry Thasarathan@HThasarathan · May 1

Our work finding universal concepts in vision models is accepted at #ICML2025!!! My first major conference paper with my wonderful collaborators and friends @MatthewKowal9 @Julian746267 @Napoolar @CSProfKGD Working with y'all is the best 🥹 Preprint ⬇️

HHarry Thasarathan@HThasarathan · Feb 7

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)

1

11

70

23

9.0K

H

Harry Thasarathan@HThasarathan · May 1

Accepted at #ICML2025! Check out the preprint. Shoutout to the group for an AMAZING research journey @HThasarathan @Julian746267 @Napoolar @MatthewKowal9 This is Harry’s first PhD paper (first year, great start) and Julian’s first ever paper (work done as an undergrad 💪).

HHarry Thasarathan@HThasarathan · Feb 7

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)

6

12

89

34

8.0K

Harry Thasarathan Retweeted

R

Rohit Gandikota@rohitgandikota · Apr 18

How does a diffusion model learn to mimic art styles? 🎨 Our latest work reveals that diffusion models create entirely new art styles to learn the concept - "art" 🤯 Checkout these art styles that @StabilityAI's SDXL has learnt. Do you recognize them?🤔 And we found more👇 🧵

11

50

410

357

39.0K

H

Harry Thasarathan@HThasarathan · Apr 1

♟️♟️Now our work on teaching superhuman chess strategies to grandmasters (one of whom @DGukesh who became the latest and the youngest world chess champion) is published on PNAS! 🎉🎉 Yes, we can transfer machine knowledge to humans to push the frontier of human. knowledge.…

LLisa Schut@miouantoinette · Apr 1

Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With @weballergy, @banburismus_, @demishassabis, @ulrichpaquet, @_beenkim 🎉 📄 doi.org/10.1073/pnas.2…

0

19

155

43

17.0K

Harry Thasarathan Retweeted

T

Thomas Fel@Napoolar · Mar 13

Train your vision SAE on Monday, then again on Tuesday, and you'll find only about 30% of the learned concepts match. ⚓ We propose Archetypal SAE which anchors concepts in the real data’s convex hull, delivering stable and consistent dictionaries. arxiv.org/pdf/2502.12892…

6

79

355

224

41.0K

H

Harry Thasarathan@HThasarathan · Feb 8

model objectives matter! self supervised model learns geometric features (useful for reconstruction!), text/image contrastive model learns a different feature set presumably useful for “is this object in the caption?”

HHarry Thasarathan@HThasarathan · Feb 7

Our method reveals model-specific patterns too: DinoV2 (left) shows specialized geometric features (depth, perspective), while SigLIP (right) captures unique text-aware visual concepts: This opens new paths for understanding model differences! (7/9)

0

2

4

0

1.0K

H

Harry Thasarathan@HThasarathan · Feb 8

These visuals really highlight super well the differences between DINOv2 and CLIP: the latter has these text-induced abstractions that span across visual concepts, while the former has more advanced geometric concepts

HHarry Thasarathan@HThasarathan · Feb 7

Our method reveals model-specific patterns too: DinoV2 (left) shows specialized geometric features (depth, perspective), while SigLIP (right) captures unique text-aware visual concepts: This opens new paths for understanding model differences! (7/9)

1

9

119

45

8.0K

H

Harry Thasarathan@HThasarathan · Feb 7

This project was an absolute blast to work on with @HThasarathan and the team. Everyone did a really great job! I am SUPER excited about these results and the coming extensions that are cooking 🔥😉

HHarry Thasarathan@HThasarathan · Feb 7

🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)

0

1

14

2

664