�

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞

@hima_lakkaraju

Assistant Professor @Harvard; Co-Chair @trustworthy_ml; #AI #ML #Safety #XAI; Stanford PhD; MIT @techreview #35InnovatorsUnder35; Sloan and Kavli Fellow.

Boston, MA

Joined May 2009

983Following

16KFollowers

Pinned

�

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞@hima_lakkaraju · Feb 4

Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently: 1. Feature Attribution 2. Data Attribution 3. Model Component Attribution (aka Mechanistic Interpretability) arxiv.org/abs/2501.18887…

hima_lakkaraju's tweet image. Super excited to share our latest preprint that unifies multiple areas within explainable AI that have been evolving somewhat independently:

1. Feature Attribution
2. Data Attribution
3. Model Component Attribution (aka Mechanistic Interpretability)

arxiv.org/abs/2501.18887…

136

16.0K

�

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞@hima_lakkaraju · Jul 22

We are seeking emergency reviewers for the NeurIPS 2025 ethics review. If you are interested and available to contribute this week, please sign up at tiny.cc/neurips2025eth…. @NeurIPSConf @trustworthy_ml @acidflask #AI #ML #NeurIPS

hima_lakkaraju's tweet card. Thank you for considering our invitation to serve as an ethics reviewer for the 38th Conference on Neural Information Processing Systems (NeurIPS) [1], to be held in San Diego, California, USA,...

3.0K

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞 Retweeted

Alex Oesterling@alex_oesterling · Jul 17

‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability (openreview.net/pdf?id=hgPf1ki…)! 1/n

2.0K

𝙷𝚒𝚖𝚊 𝙻𝚊𝚔𝚔𝚊𝚛𝚊𝚓𝚞 Retweeted

Shichang (Ray) Zhang@ShichangZhang · Apr 14

Honored to receive the Outstanding Paper Award at #NENLP2025! Our paper examines how post-training reshapes #LLMs and reveals mechanistic effects on knowledge, truthfulness, refusal, and confidence. Paper: arxiv.org/pdf/2504.02904 #AI #NLP #Interpretability #ExplainableAI

1.0K