Harry Thasarathan
@HThasarathan
PhD student @YorkUniversity @LassondeSchool, I work on computer vision and interpretability.
Check out this amazing work by @fenildoshi009 on shape holistic processing in vision models! 🍀
🧵 What if two images have the same local parts but represent different global shapes purely through part arrangement? Humans can spot the difference instantly! The question is can vision models do the same? 1/15
*Universal Sparse Autoencoders* by @HThasarathan @Napoolar @MatthewKowal9 @CSProfKGD They train a shared SAE latent space on several vision encoders at once, showing, e.g., how the same concept activates in different models. arxiv.org/abs/2502.03714
Around CVPR for the next 2 days—if you're into interpretability, SAEs, complexity, or just wanna know how cool @KempnerInst is, hit me up 👋
🚨 New paper alert! Linear representation hypothesis (LRH) argues concepts are encoded as **sparse sum of orthogonal directions**, motivating interpretability tools like SAEs. But what if some concepts don’t fit that mold? Would SAEs capture them? 🤔 1/11
i am once again asking more people to do vision model interpretability
We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇
Our work finding universal concepts in vision models is accepted at #ICML2025!!! My first major conference paper with my wonderful collaborators and friends @MatthewKowal9 @Julian746267 @Napoolar @CSProfKGD Working with y'all is the best 🥹 Preprint ⬇️
🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)
Accepted at #ICML2025! Check out the preprint. Shoutout to the group for an AMAZING research journey @HThasarathan @Julian746267 @Napoolar @MatthewKowal9 This is Harry’s first PhD paper (first year, great start) and Julian’s first ever paper (work done as an undergrad 💪).
🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)
How does a diffusion model learn to mimic art styles? 🎨 Our latest work reveals that diffusion models create entirely new art styles to learn the concept - "art" 🤯 Checkout these art styles that @StabilityAI's SDXL has learnt. Do you recognize them?🤔 And we found more👇 🧵
♟️♟️Now our work on teaching superhuman chess strategies to grandmasters (one of whom @DGukesh who became the latest and the youngest world chess champion) is published on PNAS! 🎉🎉 Yes, we can transfer machine knowledge to humans to push the frontier of human. knowledge.…
Excited to share that our paper "Bridging the human–AI knowledge gap through concept discovery and transfer in AlphaZero" is now out in PNAS! With @weballergy, @banburismus_, @demishassabis, @ulrichpaquet, @_beenkim 🎉 📄 doi.org/10.1073/pnas.2…
Train your vision SAE on Monday, then again on Tuesday, and you'll find only about 30% of the learned concepts match. ⚓ We propose Archetypal SAE which anchors concepts in the real data’s convex hull, delivering stable and consistent dictionaries. arxiv.org/pdf/2502.12892…
model objectives matter! self supervised model learns geometric features (useful for reconstruction!), text/image contrastive model learns a different feature set presumably useful for “is this object in the caption?”
Our method reveals model-specific patterns too: DinoV2 (left) shows specialized geometric features (depth, perspective), while SigLIP (right) captures unique text-aware visual concepts: This opens new paths for understanding model differences! (7/9)
These visuals really highlight super well the differences between DINOv2 and CLIP: the latter has these text-induced abstractions that span across visual concepts, while the former has more advanced geometric concepts
Our method reveals model-specific patterns too: DinoV2 (left) shows specialized geometric features (depth, perspective), while SigLIP (right) captures unique text-aware visual concepts: This opens new paths for understanding model differences! (7/9)
This project was an absolute blast to work on with @HThasarathan and the team. Everyone did a really great job! I am SUPER excited about these results and the coming extensions that are cooking 🔥😉
🌌🛰️Wanna know which features are universal vs unique in your models and how to find them? Excited to share our preprint: "Universal Sparse Autoencoders: Interpretable Cross-Model Concept Alignment"! arxiv.org/abs/2502.03714 (1/9)