Understandable Machine Intelligence Lab
@UMI_Lab_AI
Understandable Machine Intelligence Lab: We bring #explainable #AI to the next level. Part of @LeibnizATB, Ex @TUBerlin, funded by @BMBF_Bund #XAI
Our latest paper is out! 🚀
🔍 When do neurons encode multiple concepts? We introduce PRISM, a framework for extracting multi-concept feature descriptions to better understand polysemanticity. 📄 Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework arxiv.org/abs/2506.15538 🧵
If you're at #AAAI2025 don't miss our poster today (alignment track)! Paper 📘: arxiv.org/pdf/2502.15403 Code 👩💻: github.com/annahedstroem/… Team work with @eirasf and @Marina_MCV
At 12:30 I'll be happy to take questions about our poster presentation at #AAAI2025. Is your explanation for a model's prediction better than the alternatives? "Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution" introduces QGE... 1/4
At 12:30 I'll be happy to take questions about our poster presentation at #AAAI2025. Is your explanation for a model's prediction better than the alternatives? "Evaluate with the Inverse: Efficient Approximation of Latent Explanation Quality Distribution" introduces QGE... 1/4
If you’re still @NeurIPSConf and are curious about how evaluation outcomes of interpretability methods can be adversarially attacked? 👉 Sun, 15 Dec, 4 PM (East Ballroom A and B) The Price of Freedom: An Adversarial Attack on Interpretability Evaluation
Interested in eval, vision and mechanistic interpretability? Come chat at our @NeurIPSConf poster! #3107 👉 Thu, 12 Dec, 11 AM (East Exhibit Hall A-C) CoSy: Evaluating Textual Explanations of Neurons openreview.net/pdf?id=R0bnWrp…
First day #NeurIPS2024 jetlagged but happy to be reunited with @UMI_Lab_AI @Marina_MCV @lkopf_ml
I’ll be presenting our work at @NeurIPSConf in Vancouver! 🎉 Join me this Thursday, December 12th, in East Exhibit Hall A-C, Poster #3107, from 11 a.m. PST to 2 p.m. PST. I'll be discussing our paper “CoSy: Evaluating Textual Explanations of Neurons.”
I am not attending #NeurIPS2024, but I encourage everyone interested in #XAI and #MechInterp to check out our paper on evaluating textual descriptions of neurons! Join @lkopf_ml, @anna_hedstroem, and @Marina_MCV on Thu 09.12, 1 p.m. to 4 p.m. CST at East Exhibit Hall A-C #3107!
✨Introducing quanda: an open-source library for benchmarking training data attribution (TDA) methods in PyTorch! Quanda offers a user-friendly interface for ⚖️ evaluating attributions and 📊 benchmarking TDA methods across diverse metrics.
We (@di_lya, Galip Ümit Yolcu, Niklas Schmolenski, @wiegand_t, @WojciechSamek, @SLapuschkin + me) just launched QUANDA — a training data attribution TDA software Built for researchers curious to apply/ develop/ evaluate TDA methods GitHub repo: github.com/dilyabareeva/q…