AI Security Institute

@AISecurityInst

We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.

United Kingdom

Joined February 2024

29Following

5KFollowers

AI Security Institute@AISecurityInst · Jul 17

🌏Our largest international joint testing exercise to date brought together participants from across the globe to advance the science of AI agent evaluations. Read more at the link below ⬇️ aisi.gov.uk/work/internati…

2.0K

AI Security Institute Retweeted

Feryal Clark MP@FeryalClark · Jul 16

We want experts to join our new Responsible AI Advisory Panel, to provide independent oversight and strategic advice on government's use of AI. Help develop secure, ethical and responsible public sector AI. gov.uk/government/new… gov.uk/government/gro…

2.0K

AI Security Institute@AISecurityInst · Jul 15

Chain of Thought (CoT) monitoring could be a powerful tool for overseeing future AI systems—especially as they become more agentic. That’s why we’re backing a new research paper from a cross-institutional team of researchers pushing this work forward.

BBowen Baker@bobabowen · Jul 15

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

214

398

3.0K

881

579.0K

AI Security Institute@AISecurityInst · Jul 16

The monitorability of chain of thought is an exciting opportunity for AI safety. But as models get more powerful, it could require ongoing, active commitments to preserve. We’re excited to collaborate with @apolloaievals and many authors from frontier labs on this position paper.

MMikita Balesni 🇺🇦@balesni · Jul 15

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…

1.0K