Thomas Fel
@Napoolar
Explainability, Computer Vision, Neuro-AI @Harvard. Research Fellow @KempnerInst. Prev. @tserre lab, @Google, @GoPro. Crêpe lover.
Train your vision SAE on Monday, then again on Tuesday, and you'll find only about 30% of the learned concepts match. ⚓ We propose Archetypal SAE which anchors concepts in the real data’s convex hull, delivering stable and consistent dictionaries. arxiv.org/pdf/2502.12892…

Beyond robustness: Lipschitz networks = stability. Different inits, different seeds, different weights—same function. A thread 🧵
Great excuse to share something I really love: 1-Lipschitz nets. They give clean theory, certs for robustness, the right loss for W-GANs, even nicer grads for explainability!! Yet are still niche. Here’s a speed-run through some of my favorite papers on the field. 🧵👇
I'd like to highlight this very cool finding by @jcz12856876. He finds llms have harmfulness representations different from refusal! He can use it to detect some jailbreaking attacks... It's an excellent step towards precise, interpretable control of safety behavior.
1/ 🚨New Paper 🚨 LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? ⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still…
Can synapses in the brain switch their signs between excitatory and inhibitory during learning🚦? Can they act more like weights in artificial neural networks, able to switch signs based on experience 🔃? Excited to share my thesis work in @blsabatini lab! 🧵 ⬇️ (1/13)
Preprint of (not) today: Bohacek and Fel et al., "Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders" -- arxiv.org/abs/2506.19708 What are some things that text-to-image generators cannot generate? An interesting systematic way to look into it.
Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵
Excited to share new work @icmlconf by Loek van Rossem exploring the development of computational algorithms in recurrent neural networks. Hear it live tomorrow, Oral 1D, Tues 14 Jul West Exhibition Hall C: icml.cc/virtual/2025/p… Paper: openreview.net/forum?id=3go0l… (1/11)
Nice survey of papers working towards NNs with somewhat practical realistic Lipschitz bounds:
Great excuse to share something I really love: 1-Lipschitz nets. They give clean theory, certs for robustness, the right loss for W-GANs, even nicer grads for explainability!! Yet are still niche. Here’s a speed-run through some of my favorite papers on the field. 🧵👇