Geoffrey Irving

@geoffreyirving

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

London

Joined September 2009

326Following

10KFollowers

Pinned

Geoffrey Irving@geoffreyirving · Jun 17

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

geoffreyirving's tweet image. New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

348

242

27.0K

Geoffrey Irving Retweeted

Hannah Rose Kirk@hannahrosekirk · Jul 22

My team at @AISecurityInst is hiring! This is an awesome opportunity to get involved with cutting-edge scientific research inside government on frontier AI models. I genuinely love my job and the team 🤗 Link: civilservicejobs.service.gov.uk/csr/jobs.cgi?j… More Info: ⬇️

108

10.0K

Geoffrey Irving Retweeted

Geoffrey Irving@geoffreyirving · Jul 19

Once I copied an interval * interval multiplication routine from a paper, and formally proved it correct. But I had made a typo when copying it. (Fortunately the typo didn’t affect the correctness.)

747

Geoffrey Irving Retweeted

Xander Davies@alxndrdavies · Jul 17

We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

141

17.0K

Geoffrey Irving Retweeted

Mikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

102

416

242

197.0K

Geoffrey Irving@geoffreyirving · Jul 15

A huge component of the AI Security Institute's impact is tied to the scientific quality of our capability evaluations of LLMs. If you find details of rigorous experimental design exciting, please apply to Coz's team!

CCozmin Ududec@CUdudec · Jul 13

We're hiring a Senior Researcher for the Science of Evaluation team! We are an internal red-team, stress-testing the methods and evidence behind AISI’s evaluations. If you're sharp, methodologically rigorous, and want shape research and policy, this role might be for you! 🧵

1.0K

Geoffrey Irving Retweeted

Anthony Bonato@Anthony_Bonato · Jul 12

log(n): grows very slowly with n loglog(n): bounded above by 4 logloglog(n): constant loglogloglogloglogloglog(n): decreasing

587

127

34.0K

Geoffrey Irving@geoffreyirving · Jul 2

New work out: We demonstrate a new attack against stacked safeguards and analyse defence in depth strategies. Excited for this joint collab between @farairesearch and @AISecurityInst to be out!

FFAR.AI@farairesearch · Jul 2

1/ "Swiss cheese security", stacking layers of imperfect defenses, is a key part of AI companies' plans to safeguard models, and is used to secure Anthropic's Opus 4 model. Our new STACK attack breaks each layer in turn, highlighting this approach may be less secure than hoped.

2.0K

Geoffrey Irving Retweeted

ARIA@ARIA_research · Jun 25

📢 £18m grant opportunity in Safeguarded AI: we're looking to catalyse the creation of a new UK-based non-profit to lead groundbreaking machine learning research for provably safe AI. Learn more and apply by 1 October 2025: link.aria.org.uk/ta2-phase2-x

12.0K