David Lindner

@davlindner

Making AI safer @GoogleDeepMind

London, UK

Joined April 2012

329Following

2KFollowers

Pinned

David Lindner@davlindner · Jul 8

Excited to share some technical details about our approach to scheming and deceptive alignment as outlined in Google's Frontier Safety Framework! (1) current models are not yet capable of realistic scheming (2) CoT monitoring is a promising mitigation for future scheming

VVictoria Krakovna@vkrakovna · Jul 8

As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420

1.0K

David Lindner@davlindner · Jul 17

I'll be presenting MONA at ICML in the afternoon poster session today. Come stop by from 4:30 pm at East Exhibition Hall E-902

DDavid Lindner@davlindner · Jan 23

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

958

David Lindner@davlindner · Jul 15

Legible chain-of-thought is incredibly useful for building safe AI. So let's not loose this property!

MMikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

329

David Lindner@davlindner · Jul 13

I'll be at ICML this week, looking forward to catching up with old friends and meeting new faces. Lmk if you want to chat!

275

David Lindner Retweeted

METR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

236

1.0K

7.0K

3.0K

3.5M

David Lindner@davlindner · Jul 9

Two new papers that elaborate on our approach to deceptive alignment! First paper: we evaluate the model's *stealth* and *situational awareness* -- if they don't have these capabilities, they likely can't cause severe harm. x.com/vkrakovna/stat…

VVictoria Krakovna@vkrakovna · Jul 8

104

41.0K

David Lindner Retweeted

Scott Emmons@emmons_scott · Jul 9

Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…

170

64.0K

David Lindner@davlindner · Jul 6

New episode with @SamuelAlbanie, where we discuss the recent Google DeepMind paper "An Approach to Technical AGI Safety and Security"! Link to watch below.

AAXRP - the AI X-risk Research Podcast@AXRPodcast · Jul 6

Episode 45 - Samuel Albanie on DeepMind's AGI Safety Approach axrp.net/episode/2025/0…

2.0K

David Lindner@davlindner · Jun 16

Had a great conversation with Daniel about our MONA paper. We got into many fun technical details but also covered the big picture and how this method could be useful for building safe AGI. Thanks for having me on!

DDaniel Filan@dfrsrchtwts · Jun 15

New episode with @davlindner, covering his work on MONA! Check it out - video link in reply.

8.0K