Laura Kopf
@lkopf_ml
PhD student in Interpretable Machine Learning @bifoldberlin @TUBerlin
馃攳 When do neurons encode multiple concepts? We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity. 馃搫 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework arxiv.org/abs/2506.15538 馃У

I am not attending #NeurIPS2024, but I encourage everyone interested in #XAI and #MechInterp to check out our paper on evaluating textual descriptions of neurons! Join @lkopf_ml, @anna_hedstroem, and @Marina_MCV on聽Thu 09.12, 1 p.m. to 4 p.m. CST聽at聽East Exhibit Hall A-C #3107!
NeurIPS has an overwhelming amount of papers, so I made myself a hacky spreadsheet of all (well, most) of the interpretability papers - sharing in case others find it useful! It's definitely got false negatives and positives, but hopefully is better than baseline.
Join us today at the #ICML2024 Workshop on the Next Generation of AI Safety! Find @kirill_bykov and me in Hall A1 at Poster Session #2, from 3:30 PM to 4:30 PM. Looking forward to seeing you there!

Join us at the @icmlconf in Vienna next week. We are presenting two of our papers at the Mechanistic Interpretability and Next Generation of AI Safety workshops: 鈥oSy: Evaluating Textual Explanations of Neurons 鈥anipulating Feature Visualizations with Gradient Slingshots