Alex Turner
@Turn_Trout
Research scientist on the scalable alignment team at Google DeepMind. All views are my own. https://turntrout.com
This study turned out to be fraudulent. Please undo any updates made based on it.
There have been papers showing AI might be an equaliser, helping under-performers catch up. I'm skeptical this continues. In a new @MIT paper about materials science, AI boosted the output of top researchers 80%, while the bottom third showed little gains. Why? The AI sped up…
What a cool use of steering vectors!
Do reasoning models like DeepSeek R1 learn their behavior from scratch? No! In our new paper, we extract steering vectors from a base model that induce backtracking in a distilled reasoning model, but surprisingly have no apparent effect on the base model itself! 🧵 (1/5)
Another eye-opening paper by @cloud_kx and @OwainEvans_UK. Isn't it scientifically amazing how cognition is trained into networks?
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Paper: arxiv.org/abs/2507.05246 In collaboration with @jenner_erik, @davidelson, @derifatives, @sen_r, @heng__chen, @irhumshafkat, and @rohinmshah at @GoogleDeepMind
Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…