AI Security Institute
@AISecurityInst
We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.
🌏Our largest international joint testing exercise to date brought together participants from across the globe to advance the science of AI agent evaluations. Read more at the link below ⬇️ aisi.gov.uk/work/internati…
We want experts to join our new Responsible AI Advisory Panel, to provide independent oversight and strategic advice on government's use of AI. Help develop secure, ethical and responsible public sector AI. gov.uk/government/new… gov.uk/government/gro…
Chain of Thought (CoT) monitoring could be a powerful tool for overseeing future AI systems—especially as they become more agentic. That’s why we’re backing a new research paper from a cross-institutional team of researchers pushing this work forward.
Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.
The monitorability of chain of thought is an exciting opportunity for AI safety. But as models get more powerful, it could require ongoing, active commitments to preserve. We’re excited to collaborate with @apolloaievals and many authors from frontier labs on this position paper.
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:…