π·πππ π»ππππππππ
@hima_lakkaraju
Assistant Professor @Harvard; Co-Chair @trustworthy_ml; #AI #ML #Safety #XAI; Stanford PhD; MIT @techreview #35InnovatorsUnder35; Sloan and Kavli Fellow.
Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 1. Feature Attribution 2. Data Attribution 3. Model Component Attribution (aka Mechanistic Interpretability) arxiv.org/abs/2501.18887β¦

We are seeking emergency reviewers for the NeurIPS 2025 ethics review. If you are interested and available to contribute this week, please sign up at tiny.cc/neurips2025ethβ¦. @NeurIPSConf @trustworthy_ml @acidflask #AI #ML #NeurIPS
βΌοΈπNew paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability (openreview.net/pdf?id=hgPf1kiβ¦)! 1/n
Honored to receive the Outstanding Paper Award at #NENLP2025! Our paper examines how post-training reshapes #LLMs and reveals mechanistic effects on knowledge, truthfulness, refusal, and confidence. Paper: arxiv.org/pdf/2504.02904 #AI #NLP #Interpretability #ExplainableAI