Thomas Fel

@Napoolar

Explainability, Computer Vision, Neuro-AI @Harvard. Research Fellow @KempnerInst. Prev. @tserre lab, @Google, @GoPro. Crêpe lover.

Boston, MA

Joined February 2017

723Following

1KFollowers

Pinned

Thomas Fel@Napoolar · Mar 13

Train your vision SAE on Monday, then again on Tuesday, and you'll find only about 30% of the learned concepts match. ⚓ We propose Archetypal SAE which anchors concepts in the real data’s convex hull, delivering stable and consistent dictionaries. arxiv.org/pdf/2502.12892…

Napoolar's tweet image. Train your vision SAE on Monday, then again on Tuesday, and you'll find only about 30% of the learned concepts match.

⚓ We propose Archetypal SAE which anchors concepts in the real data’s convex hull, delivering stable and consistent dictionaries.

arxiv.org/pdf/2502.12892…

355

224

41.0K

Thomas Fel@Napoolar · Jul 25

Beyond robustness: Lipschitz networks = stability. Different inits, different seeds, different weights—same function. A thread 🧵

TThomas Fel@Napoolar · Jul 20

Great excuse to share something I really love: 1-Lipschitz nets. They give clean theory, certs for robustness, the right loss for W-GANs, even nicer grads for explainability!! Yet are still niche. Here’s a speed-run through some of my favorite papers on the field. 🧵👇

564

Thomas Fel@Napoolar · Jul 25

I'd like to highlight this very cool finding by @jcz12856876. He finds llms have harmfulness representations different from refusal! He can use it to detect some jailbreaking attacks... It's an excellent step towards precise, interpretable control of safety behavior.

JJiachen Zhao@jcz12856876 · Jul 22

1/ 🚨New Paper 🚨 LLMs are trained to refuse harmful instructions, but internally, do they see harmfulness and refusal as the same? ⚔️We find causal evidence that 👈”LLMs encode harmfulness and refusal separately” 👉. ✂️LLMs may know a prompt is harmful internally yet still…

3.0K

Thomas Fel Retweeted

Shun Li@shunnnli · Jul 24

Can synapses in the brain switch their signs between excitatory and inhibitory during learning🚦? Can they act more like weights in artificial neural networks, able to switch signs based on experience 🔃? Excited to share my thesis work in @blsabatini lab! 🧵 ⬇️ (1/13)

111

14.0K

Thomas Fel Retweeted

Kwang Moo Yi@kwangmoo_yi · Jul 23

Preprint of (not) today: Bohacek and Fel et al., "Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders" -- arxiv.org/abs/2506.19708 What are some things that text-to-image generators cannot generate? An interesting systematic way to look into it.

4.0K

Thomas Fel Retweeted

Andy Keller@t_andy_keller · Jul 22

Why do video models handle motion so poorly? It might be lack of motion equivariance. Very excited to introduce: Flow Equivariant RNNs (FERNNs), the first sequence models to respect symmetries over time. Paper: arxiv.org/abs/2507.14793 Blog: kempnerinstitute.harvard.edu/research/deepe… 1/🧵

386

253

24.0K

Thomas Fel Retweeted

Andrew Saxe@SaxeLab · Jul 14

Excited to share new work @icmlconf by Loek van Rossem exploring the development of computational algorithms in recurrent neural networks. Hear it live tomorrow, Oral 1D, Tues 14 Jul West Exhibition Hall C: icml.cc/virtual/2025/p… Paper: openreview.net/forum?id=3go0l… (1/11)

5.0K

Thomas Fel@Napoolar · Jul 21

Nice survey of papers working towards NNs with somewhat practical realistic Lipschitz bounds:

TThomas Fel@Napoolar · Jul 20

132

16.0K