Victoria Krakovna

@vkrakovna

Research scientist in AI alignment at Google DeepMind. Co-founder of Future of Life Institute @flixrisk. Views are my own and do not represent GDM or FLI.

London, England

Joined June 2014

499Following

10KFollowers

Pinned

Victoria Krakovna Retweeted

Rob Wiblin@robertwiblin · Jul 1

Holy shit these quotes from Congress are absolutely eye-popping: "...this week lawmakers demonstrated a level of AGI situational awareness that would have been unthinkable just months ago. •“Whether it’s American AI or Chinese AI, it should not be released until we know it’s…

355

151

37.0K

Victoria Krakovna Retweeted

Future of Life Institute@FLI_org · Jul 17

‼️📝 Our new AI Safety Index is out! ➡️ Following our 2024 index, 6 independent AI experts rated leading AI companies - @OpenAI, @AnthropicAI, @AIatMeta, @GoogleDeepMind, @xAI, @deepseek_ai & Zhipu AI - across critical safety and security domains. So what were the results? 🧵👇

105

22.0K

Victoria Krakovna Retweeted

Bowen Baker@bobabowen · Jul 15

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

149

791

504

681.0K

Victoria Krakovna@vkrakovna · Jul 15

Chain of thought monitoring looks valuable enough that we’ve put it in our Frontier Safety Framework to address deceptive alignment. This paper is a good explanation of why we’re optimistic – but also why it may be fragile, and what to do to preserve it. x.com/balesni/status…

MMikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

6.0K

Victoria Krakovna Retweeted

Mikita Balesni 🇺🇦@balesni · Jul 15

102

416

242

195.0K

Victoria Krakovna@vkrakovna · Jul 9

Two new papers that elaborate on our approach to deceptive alignment! First paper: we evaluate the model's *stealth* and *situational awareness* -- if they don't have these capabilities, they likely can't cause severe harm. x.com/vkrakovna/stat…

VVictoria Krakovna@vkrakovna · Jul 8

As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420

104

41.0K

Victoria Krakovna@vkrakovna · Jul 9

Great work from my colleagues stress-testing chain-of-thought monitoring. For complex behaviors, models have to expose their reasoning in the chain of thought, making it monitorable. Paper: arxiv.org/abs/2507.05246 Blog post: deepmindsafetyresearch.medium.com/evaluating-and…

SScott Emmons@emmons_scott · Jul 9

Is CoT monitoring a lost cause due to unfaithfulness? 🤔 We say no. The key is the complexity of the bad behavior. When we replicate prior unfaithfulness work but increase complexity—unfaithfulness vanishes! Our finding: "When Chain of Thought is Necessary, Language Models…

2.0K

Victoria Krakovna Retweeted

Americans for Responsible Innovation@americans4ri · Jul 1

The moratorium just got taken out of the budget bill in a LANDSLIDE vote. 99 to 1. Incredible. Thank you to the lawmakers, the children's advocates, the artists and creators, the voters, the labor groups, and everyone who spoke out against the harmful AI law moratorium.

100

413

145.0K

Victoria Krakovna@vkrakovna · Jun 27

The Singapore Consensus is on arXiv now -- arxiv.org/abs/2506.20702 It offers: 1. An overview of consensus technical AI safety priorities 2. An example of widespread international collab & agreement

MMax Tegmark@tegmark · Jun 27

I'm honored to be part of arXiv:2506.20702, "The Singapore Consensus on Global AI Safety Research Priorities". Across companies and countries, there's more agreement than you'd think (paper URL in replies):

2.0K

Victoria Krakovna Retweeted

Rob Miles@robertskmiles · May 14

New video, about how to work in technical AI Safety research! (link in reply)

234

31.0K

Victoria Krakovna@vkrakovna · Apr 29

Gemini 2.5 Pro system card has now been updated with frontier safety evaluations results, testing for critical capabilities in CBRN, cybersecurity, ML R&D and deceptive alignment. storage.googleapis.com/model-cards/do…

10.0K

Victoria Krakovna@vkrakovna · Apr 4

IMO, this isn't much of an update against CoT monitoring hopes. They show unfaithfulness when the reasoning is minimal enough that it doesn't need CoT. But, my hopes for CoT monitoring are because models will have to reason a lot to end up misaligned and cause huge problems. 🧵

AAnthropic@AnthropicAI · Apr 3

New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don't. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

158

14.0K

Victoria Krakovna@vkrakovna · Apr 4

If you'd like to learn more about GDM's approach to AGI safety and security, but have limited time, check out this 3-minute talk in our AGI safety course for a quick summary: youtube.com/watch?v=RGh8wP…

RRohin Shah@rohinmshah · Apr 3

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

2.0K