Neel Nanda

@NeelNanda5

Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

London, UK

Joined June 2022

123Following

29KFollowers

Pinned

Neel Nanda@NeelNanda5 · May 12

After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS

NeelNanda5's tweet image. After supervising 20+ papers, I have highly opinionated views on writing great ML papers. When I entered the field I found this all frustratingly opaque

So I wrote a guide on turning research into high-quality papers with scientific integrity! Hopefully still useful for NeurIPS

180

2.0K

3.0K

149.0K

Neel Nanda@NeelNanda5 · 1 h

Very cool work! Base models *can* backtrack, but often don't, a key CoT model skill. Turns out the choice to do it involves base model concepts, put to new use! Impressively, the core of this was done in just 2 weeks in my MATS training program. New applications open this week!

JJake Ward@_jake_ward · 6 h

Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors from a base model that induce backtracking in a distilled reasoning model, but surprisingly have no apparent effect on the base model itself! 🧵 (1/5)

3.0K

Neel Nanda Retweeted

Jacob Hilton@JacobHHilton · Jul 22

There is still an opportunity for @OpenAI to live up to its founding promises, instead of abandoning them. Here I explain what this could look like.

135

11.0K

Neel Nanda Retweeted

Chris Schnabl@inxoy_ · Jul 21

CS 2881 by @boazbaraktcs is the University course I'm most excited about in a while. Even better it features @EdTurner42 and @NeelNanda5 paper about Emergent Misalignment. Anyone interested in AI Safety should follow along. windowsontheory.org/2025/07/20/ai-…

5.0K

Neel Nanda@NeelNanda5 · Jul 15

Chain of Thought (CoT) monitoring could be a powerful tool for overseeing future AI systems—especially as they become more agentic. That’s why we’re backing a new research paper from a cross-institutional team of researchers pushing this work forward.

BBowen Baker@bobabowen · Jul 15

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

214

398

3.0K

881

579.0K

Neel Nanda Retweeted

Bowen Baker@bobabowen · Jul 15

150

791

504

681.0K

Neel Nanda@NeelNanda5 · Jul 19

Go check out Ed and Anna's great work at the ICML actionable interpretability workshop today! (And if you want to replicate their great fashion choices, check out interp[dot]shop)

AAnna@anna_soligo · Jul 19

@EdTurner42 and I are at ICML today presenting our posters on Emergent Misalignment! Come find us at the Actionable Interpretability Workshop and the R2FM Workshop. T-shirt creds to @NeelNanda5 :)

7.0K