Jack Lindsey

@Jack_W_Lindsey

Neuroscience of AI brains @AnthropicAI. Previously neuroscience of real brains @cu_neurotheory.

Joined January 2019

227Following

3KFollowers

Pinned

Jack Lindsey@Jack_W_Lindsey · Jul 23

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

133

186

2.0K

1.0K

251.0K

Jack Lindsey Retweeted

Ching Fang (chingfang.bsky.social)@chingfang17 · Jun 26

Humans and animals can rapidly learn in new environments. What computations support this? We study the mechanisms of in-context reinforcement learning in transformers, and propose how episodic memory can support rapid learning. Work w/ @KanakaRajanPhD: arxiv.org/abs/2506.19686

245

173

23.0K

Jack Lindsey Retweeted

Anthropic@AnthropicAI · May 29

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

118

583

5.0K

2.0K

754.0K

Jack Lindsey Retweeted

Chris Olah@ch402 · May 7

The Anthropic Interpretability Team is planning a virtual Q&A to answer Qs about how we plan to make models safer, the role of the team at Anthropic, where we’re headed, and what it’s like to work here! Please let us know if you’d be interested forms.gle/VeZZVz1NFsArzS…

401

173

35.0K

Jack Lindsey Retweeted

Clément Dumas@Butanium_ · Apr 7

New paper w/@jkminder & @NeelNanda5! What do chat LLMs learn in finetuning? Anthropic introduced a tool for this: crosscoders, an SAE variant. We find key limitations of crosscoders & fix them with BatchTopK crosscoders This finds interpretable and causal chat-only features! 🧵

187

120

30.0K

Jack Lindsey Retweeted

Ethan Mollick@emollick · Mar 28

There’s at least a dozen dissertations to be written from this paper by Anthropic alone, which gives us some insight into how AIs “think” and reveal a lot of complexity and unexpected abilities, including generalization and planning. transformer-circuits.pub/2025/attributi…

367

264

24.0K

Jack Lindsey@Jack_W_Lindsey · Mar 28

Great summary of a fun finding from our paper

jjohn@JohnBcde · Mar 27

3.0K

Jack Lindsey@Jack_W_Lindsey · Mar 13

I participated in this as an auditor, poking around in an LLM's brain to find its evil secrets. Most fun I've had at work! Very clever + thoughtful work by the lead authors in designing the model + the game, which set a precedent for how we can validate safety auditing techniques

AAnthropic@AnthropicAI · Mar 13

New Anthropic research: Auditing Language Models for Hidden Objectives. We deliberately trained a model with a hidden misaligned objective and put researchers to the test: Could they figure out the objective without being told?

18.0K

Jack Lindsey Retweeted

Lee Sharkey@leedsharkey · Jan 29

Big new review! 🟦Open Problems in Mechanistic Interpretability🟦 We bring together perspectives from ~30 top researchers to outline the current frontiers of mech interp. It highlights the open problems that we think the field should prioritize! 🧵

549

587

75.0K