Sarah Cogan
@sarah_cogan
existential risks are bad. I’m tall. SWE @GoogleDeepMind Frontier Safety
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…
pro tip: if you ask for birthday presents, not only will you get gifts, your friends will commend you for asking!

As models advance, a key AI safety concern is deceptive alignment / "scheming" – where AI might covertly pursue unintended goals. Our paper "Evaluating Frontier Models for Stealth and Situational Awareness" assesses whether current models can scheme. arxiv.org/abs/2505.01420
Full transparency, I did not know about this section on pages 278-279 of the OBBB that strips states of the right to make laws or regulate AI for 10 years. I am adamantly OPPOSED to this and it is a violation of state rights and I would have voted NO if I had known this was in…
Full transparency, I did not know about this section on pages 278-279 of the OBBB that strips states of the right to make laws or regulate AI for 10 years. I am adamantly OPPOSED to this and it is a violation of state rights and I would have voted NO if I had known this was in…
was legit worried this was about me until I remembered I was rehearsing *my* breakup with Gemini Live in the Mission
eating sushi in salesforce park, listening to the woman next to me use chatgpt to rehearse her breakup conversation
Trying to land this plane safely on the Hudson
if you're 25+ and listening to evanescence or creed, wtf are you doing?
you would not believe what it took to get this into the dataset for the ai to parrot it (stochastically)
Veo3 artık Türkiye’de. ✨ Gemini’ı sen de dene. Prompt: Bol fıstıklı baklavadan yapılmış bir bilgisayar klavyesi hayal et. Bir kişininin ellerini baklavadan tuşlara basarak dizüstü bilgisayarında yazı yazarken görüyoruz.
date my bestie ✨
I wrote a dating doc! Read, send to your friends, and signal boost to help me find my future wife. Link in reply
I’m the most powerful archetype there is- ethereal mid
has anyone stopped to ask WHY students cheat? would a buddhist monk "cheat" at meditation? would an artist "cheat" at painting? no. when process and outcomes are aligned, there's no incentive to cheat. so what's happening differently at colleges? the answer is in the article:
Per our Frontier Safety Framework, we continue to test our models for critical capabilities. Here’s the updated model card for Gemini 2.5Pro with frontier safety evaluations + explanation of how our safety buffer / alert thresholds approach applies to 2.0, 2.5, and what’s coming.…
Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
AGI could revolutionize many fields - from healthcare to education - but it's crucial that it’s developed responsibly. Today, we’re sharing how we’re thinking about safety and security on the path to AGI. → goo.gle/3R08XcD
I have long timelines (like, I’m on Twitter a lot)
love to go to a party in SF and discuss everything we’ve been seeing on our timelines
Tool the Love Science personality test and I think my personality is Good.
Took the Love Science personality test and I think my personality is bad.
But doctor…I am Pagliacci
most engineers can't solve this old google interview question: imagine interacting with a woman. you must carry conversation for 60 seconds; what would you do?
We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.