Alex Turner

@Turn_Trout

Research scientist on the scalable alignment team at Google DeepMind. All views are my own. https://turntrout.com

Berkeley, CA

Joined December 2021

57Following

3KFollowers

Pinned

Alex Turner@Turn_Trout · Jul 12

This study turned out to be fraudulent. Please undo any updates made based on it.

BBenjamin Todd@ben_j_todd · Nov 28

There have been papers showing AI might be an equaliser, helping under-performers catch up. I'm skeptical this continues. In a new @MIT paper about materials science, AI boosted the output of top researchers 80%, while the bottom third showed little gains. Why? The AI sped up…

266

19.0K

Alex Turner@Turn_Trout · Jul 24

What a cool use of steering vectors!

JJake Ward@_jake_ward · Jul 23

Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors from a base model that induce backtracking in a distilled reasoning model, but surprisingly have no apparent effect on the base model itself! 🧵 (1/5)

1.0K

Alex Turner@Turn_Trout · Jul 22

Another eye-opening paper by @cloud_kx and @OwainEvans_UK. Isn't it scientifically amazing how cognition is trained into networks?

OOwain Evans@OwainEvans_UK · Jul 22

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

2.0K

Alex Turner Retweeted

Scott Emmons@emmons_scott · Jul 9

Paper: arxiv.org/abs/2507.05246 In collaboration with @jenner_erik, @davidelson, @derifatives, @sen_r, @heng__chen, @irhumshafkat, and @rohinmshah at @GoogleDeepMind

1.0K

Alex Turner Retweeted

Scott Emmons@emmons_scott · Jul 9

Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…

170

64.0K