Nikhil Prakash
@nikhil07prakash
CS Ph.D. @KhouryCollege with @davidbau, working on DNN interpretability. Interning at @Apple.
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…

‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability (openreview.net/pdf?id=hgPf1ki…)! 1/n
Context windows are huge now (1M+ tokens) but context depth remains limited. Attention can only resolve one link at a time. Our tiny 5-layer model beats GPT-4.5 on a task requiring deep recursion. How? It learned to divide & conquer. Why this matters🧵
🚨 New preprint! 🚨 Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?
🚨 Registration is live! 🚨 The New England Mechanistic Interpretability (NEMI) Workshop is happening August 22nd 2025 at Northeastern University! A chance for the mech interp community to nerd out on how models really work 🧠🤖 🌐 Info: nemiconf.github.io/summer25/ 📝 Register:…
The new "Lookback" paper from @nikhil07prakash contains a surprising insight... 70b/405b LLMs use double pointers! Akin to C programmers' double (**) pointers. They show up when the LLM is "knowing what Sally knows Ann knows", i.e., Theory of Mind. x.com/nikhil07prakas…
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…
LLM, reinventing age-old symbolic tools one step at a time
How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…