Cozmin Ududec

@CUdudec

@AISecurityInst Testing and Science of Evals. Ex quantum foundationalist.

Joined June 2021

2KFollowing

327Followers

Cozmin Ududec Retweeted

AI Security Institute@AISecurityInst · Jul 16

To understand what risks AI systems pose, we need to evaluate the upper limits of their capabilities. Post-training enhancements can raise this limit. That's why we’re prioritising elicitation – uncovering the full range of what models can do 🧵

3.0K

Cozmin Ududec Retweeted

Daniel Kang@daniel_d_kang · Jul 8

As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks…

20.0K

Cozmin Ududec Retweeted

summerfieldlab @summerfieldlab.bsky.social@summerfieldlab · Jul 9

In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.

14.0K

Cozmin Ududec@CUdudec · Jul 10

I'll be at #ICML2025 next week - reach out if you want to chat about how to make evals more rigorous and predictive, open roles on the Science of Evals team, or what we've been up to at @AISecurityInst. I'll also be giving a talk at the Technical AI Governance Workshop!…

262

Cozmin Ududec Retweeted

AI Security Institute@AISecurityInst · Jul 10

Can we leverage an understanding of what’s happening inside AI models to stop them from causing harm? At AISI, our dedicated White Box Control Team has been working on just this🧵

3.0K

Cozmin Ududec@CUdudec · Jul 10

This is a great study with really good analysis of different factors that might impact the results!

MMETR@METR_Evals · Jul 10

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

283

Cozmin Ududec Retweeted

AI Security Institute@AISecurityInst · Jun 5

🛡️ We're making updates to the AISI Challenge Fund so the application process is faster, clearer and more accessible. More info: aisi.gov.uk/work/new-updat…

2.0K

Cozmin Ududec Retweeted

AI Security Institute@AISecurityInst · May 6

🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to answer as AI capabilities grow. It’s our roadmap for tackling the hardest technical challenges in AI security.

126

28.0K

Cozmin Ududec@CUdudec · Apr 22

New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help. Our results suggest that models are rapidly improving, and the best frontier models are…

AAI Security Institute@AISecurityInst · Apr 22

🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: aisi.gov.uk/work/replibenc…

203

65.0K

Cozmin Ududec@CUdudec · Apr 16

Love seeing deeper dives into agent behaviors—this is a great analysis!

TTransluce@TransluceAI · Apr 16

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

429

Cozmin Ududec Retweeted

AI Security Institute@AISecurityInst · Apr 11

Our latest work examines how AI control techniques can help reduce misalignment risks, and how to scale them to keep pace with advances in AI 🛡️ aisi.gov.uk/work/how-to-ev…

12.0K

Cozmin Ududec Retweeted

Transluce@TransluceAI · Mar 24

To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇

335

237

194.0K

Cozmin Ududec Retweeted

Tomek Korbak@tomekkorbak · Mar 17

We've published a short overview paper on safety cases for frontier AI. We cover their different use cases, advantages over other regulatory approaches, how they fit commitments and safety frameworks, and open problems for policy and technical AI safety researchers.

139

11.0K