Hannah Rose Kirk
@hannahrosekirk
AI researcher trying to make sense of all things cyberspace 🤖 Uni of Ox PhD (loading…) @oiioxford & @AISecurityInst. Prev @turinginst & @Cambridge_Uni.
A real honour and career dream that PRISM has won a @NeurIPSConf best paper award! 🌈 One year ago I was sat in a 13,000+ person audience of NeurIPs '23 having just finished data collection. Safe to say I've gone from feeling #stressed to very #blessed 😁
Announcing the NeurIPS 2024 Best Paper Awards: blog.neurips.cc/2024/12/10/ann…
This is *the* paper to read this week. It covers an astonishing amount of ground on the persuasive capabilities of frontier AI - from scaling laws, to post-training, to the driving mechanisms of a persuasive advantage. Very proud of @KobiHackenburg + the team at @AISecurityInst!
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
1. How can we remain healthy and free while engaging in extended personal interaction with AI agents that shape our behaviour and preferences? One answer is "socioaffective alignment" as discussed in our new paper @Nature Humanities & Social Sciences! nature.com/articles/s4159…
🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: aisi.gov.uk/work/replibenc…
How can we ensure AI systems stay safe and aligned in the context of long-term personal interaction with AI agents? New research with @hannahrosekirk, @summerfieldlab , @bertievidgen & @computermacgyve aims to answer this question!