Marie Davidsen Buhl

@MarieBassBuhl

Research Scientist @AISecurityInst| AI Policy Researcher @GovAI_ | Frontier AI Safety Cases

Joined September 2013

96Following

237Followers

Marie Davidsen Buhl@MarieBassBuhl · Jun 4

Do you know cognitive scientists / folks who run behavioural experiments with human participants? Refer them to join my team!

BBenjamin Hilton@benjamin_hilton · Jun 3

We're hiring for a cognitive scientist to join the AISI Alignment Team! Cognitive science is a crucial field that we want to galvanise to help solve one of the most important problems of our time. Could you lead that effort?

130

Marie Davidsen Buhl@MarieBassBuhl · Jun 2

More safety case sketches!

RRobert Kirk@_robertkirk · May 30

New paper! With @joshua_clymer, Jonah Weinbaum and others, we’ve written a safety case for safeguards against misuse. We lay out how developers can connect safeguard evaluation results to real-world decisions about how to deploy models. 🧵

226

Marie Davidsen Buhl Retweeted

Benjamin Hilton@benjamin_hilton · May 29

Come work with me!! I'm hiring a research manager for @AISecurityInst's Alignment Team. You'll manage exceptional researchers tackling one of humanity’s biggest challenges. Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks. 1/4

12.0K

Marie Davidsen Buhl@MarieBassBuhl · May 22

New work from my colleagues! We want AIs to do open-ended research tasks with no single right answer. How do we make sure AIs don't use that freedom to subtly mislead or cause harm? The proposal: Check that the answers are random along relevant dimensions. V cool work!

JJacob Pfau@jacob_pfau · May 21

New work with @geoffreyirving "Unexploitable search: blocking malicious use of free parameters" We formalize how misaligned AI can exploit underspecified objectives to optimize for hidden goals, and propose zero-sum games as a solution [1/3]

Marie Davidsen Buhl@MarieBassBuhl · May 16

More work from my team on alignment safety cases!

BBenjamin Hilton@benjamin_hilton · May 14

Humans are often very wrong. This is a big problem if you want to use human judgment to oversee super-smart AI systems. In our new post, @geoffreyirving argues that we might be able to deal with this issue – not by fixing the humans, but by redesigning oversight protocols.

113

Marie Davidsen Buhl Retweeted

Benjamin Hilton@benjamin_hilton · May 8

Want to build an aligned ASI? Our new paper explains how to do that, using debate. Tl;dr: Debate + exploration guarantees + no obfuscated arguments + good human input = outer alignment Outer alignment + online training = inner alignment* * sufficient for low-stakes contexts

1.0K

Marie Davidsen Buhl@MarieBassBuhl · May 8

We wrote out a very speculative safety case sketch for low-stakes alignment, based on safe-but-intractable computations using humans, scalable oversight, and learning theory + exploration guarantees. It does not work yet; the goal is to find and clarify alignment subproblems. 🧵

MMarie Davidsen Buhl@MarieBassBuhl · May 8

Can we massively scale up AI alignment research by identifying subproblems many people can work on in parallel? UK AISI’s alignment team is trying to do that. We’re starting with AI safety via debate - and we’ve just released our first paper🧵1/

3.0K

Marie Davidsen Buhl Retweeted

Jacob Pfau@jacob_pfau · May 8

When scalable oversight techniques like debate show empirical success, what additional evidence will we need to ensure the resulting models are aligned? We work through the details determining what we need to assume about deployment context, training data, training dynamics, and…

2.0K