Cozmin Ududec
@CUdudec
@AISecurityInst Testing and Science of Evals. Ex quantum foundationalist.
To understand what risks AI systems pose, we need to evaluate the upper limits of their capabilities. Post-training enhancements can raise this limit. That's why we’re prioritising elicitation – uncovering the full range of what models can do 🧵
As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks…
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
I'll be at #ICML2025 next week - reach out if you want to chat about how to make evals more rigorous and predictive, open roles on the Science of Evals team, or what we've been up to at @AISecurityInst. I'll also be giving a talk at the Technical AI Governance Workshop!…
Can we leverage an understanding of what’s happening inside AI models to stop them from causing harm? At AISI, our dedicated White Box Control Team has been working on just this🧵
This is a great study with really good analysis of different factors that might impact the results!
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
🛡️ We're making updates to the AISI Challenge Fund so the application process is faster, clearer and more accessible. More info: aisi.gov.uk/work/new-updat…
🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.
New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help. Our results suggest that models are rapidly improving, and the best frontier models are…
🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: aisi.gov.uk/work/replibenc…
Love seeing deeper dives into agent behaviors—this is a great analysis!
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
Our latest work examines how AI control techniques can help reduce misalignment risks, and how to scale them to keep pace with advances in AI 🛡️ aisi.gov.uk/work/how-to-ev…
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇
We've published a short overview paper on safety cases for frontier AI. We cover their different use cases, advantages over other regulatory approaches, how they fit commitments and safety frameworks, and open problems for policy and technical AI safety researchers.