Iason Gabriel
@IasonGabriel
Philosopher & Research Scientist @GoogleDeepMind | Humanity, Ethics & Alignment Team Lead | #TIME100AI | All views are my own
1. What are the ethical and societal implications of advanced AI assistants? What might change in a world with more agentic AI? Our new paper explores these questions: storage.googleapis.com/deepmind-media… It’s the result of a one year research collaboration involving 50+ researchers… a🧵

This paper is absolutely essential reading for anyone interested in developing a science of AI safety and evaluation. I esp. appreciate the “principle of parsimony”: Behaviours should not be attributed to complex mental processes if simpler explanations are available ✅
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
We’re hiring a sociological research scientist @GoogleDeepMind! Work with the inimitable @KLdivergence, @weidingerlaura, @iamtrask, @canfer_akbulut, Julia Haas & many others 🙌
I'm hiring! job-boards.greenhouse.io/deepmind/jobs/…
Insurance is an underrated way to unlock secure AI progress. Insurers are incentivized to truthfully quantify and track risks: if they overstate risks, they get outcompeted; if they understate risks, their payouts bankrupt them. 1/9
Check out this great new initiative + paper led by @ryan_t_lowe, @edelwax, @xuanalogue, @klingefjord & the fine folks @meaningaligned! Using rich representations of value we aim to make headway on some of the most pressing AI alignment challenges! See: full-stack-alignment.ai

There are times when it feels like we've been doing thousands of years of philosophy just to prepare for the current moment.
Reward models (RMs) are the moral compass of LLMs – but no one has x-rayed them at scale. We just ran the first exhaustive analysis of 10 leading RMs, and the results were...eye-opening. Wild disagreement, base-model imprint, identity-term bias, mere-exposure quirks & more: 🧵
Check out this work by @saffronhuang – one of the best researchers thinking about the ethical & societal impacts of AGI.
I updated my personal website! I felt like it was pretty hard to explore before, and I wanted to actually properly highlight the work/ideas that I want people to read and that I stand behind. Will keep tweaking, but have a look. :) saffronhuang.com