Yarin
@yaringal
Associate Professor of Machine Learning, University of Oxford @OATML_Oxford Group Leader Director of Research at AISI (formerly UK Taskforce on Frontier AI)
We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4
My friends, I want to organise Secure AI Club in London -- gig for people interested in (practical!) AI Security. Not just academic toy setups, but actually making systems reliable. Trying to gauge interest, please sign up here: forms.gle/zSUMh6ykthQwtt…
Self-improvement (cf DeepSeek, o3, Gemini Thinking) is the process of turning unknown knowns into known knowns. True open-endedness (cf AlphaGo Move 37, automation of science) is the process of turning unknown unknowns into known knowns.
Really delighted with the outcome of the Spending Review: £2bn to support the AI Opportunities Action Plan, including £500m for SovereignAI. So much to do but this gives the UK a great foundation.
Evaluating forgetting is hard. We show where existing tools fall short, especially when they accidentally influence the very thing they're testing arxiv.org/pdf/2506.00688 @zhilifeng @YixuanEvenXu @AlexRobey23 @_robertkirk @alxndrdavies @yaringal and @zicokolter
Funding opportunity with the UK's AI security institute! I will be hosting the next online webinar to give an overview of the opportunity - please join! aisi.gov.uk/work/new-updat…
⚠️ This is insane — and not in a good way. Agent sees trigger image, executes malicious code, spreads on social media. Totally new kind of computer worm. 😱
Hot take: I think we just demonstrated the first AI agent computer worm 🤔 When an agent sees a trigger image it's instructed to execute malicious code and then share the image on social media to trigger other users' agents This is a chance to talk about agent security 👇
Veo 3 lands in the UK and is now also available on the Gemini app. Sound on!
Last Friday, we shipped Veo 3 to 71 new countries, Pro members, and Ultra members got more credits. All week we've been scrambling to keep everything up and running - way, way, way more demand than we expected! Today, 2 more updates: + The UK now has Veo 3 access 🇬🇧 + Pro and…
I think it's quite misleading for the big labs to be promoting how well their VLMs work on pokemon, given how much (game-specific) manual annotation is required behind the scenes. Solving general tasks from pixel input is much harder than coding ("Moravec's revenge").
Yep that's exactly what i expected. They test VLMs on gameplay without scaffold: image in, action out. And they can't play. One small feedback to the authors: include a random agent (maybe best of 100 or so) as a baseline.
Thanks @kjw_chiu for linking to this satisfying article, which confirms my mental model for what is going on, and also resolves some of my own concerns with that explanation. physics.stackexchange.com/questions/1110…
Claude 4 just refactored my entire codebase in one call. 25 tool invocations. 3,000+ new lines. 12 brand new files. It modularized everything. Broke up monoliths. Cleaned up spaghetti. None of it worked. But boy was it beautiful.
Nice chance to work on some of the most exciting problems of our time!
🚨 We’re hiring! Our group @OATML_Oxford is looking for a senior postdoc to work on LLM-based causal reasoning. Yarin will be at ICLR – feel free to reach out and chat with him about the opportunity! 🔍📩 Please share with anyone you think this might be relevant to!
We have a senior postdoc position available with @OATML_Oxford (closing 19/05) to lead work on LLM based causal reasoning with GSK. Please share with anyone you think this might be relevant to! my.corehr.com/pls/uoxrecruit…
Thrilled to share that I’ve joined the @OATML_Oxford as a postdoc, working with @yaringal! Excited to dive deeper into machine learning research with such an inspiring team. 👋 DMs open – happy to connect, chat, and collaborate!
I will be at ICLR if anyone wants to chat about this / other opportunities with the group. DM me
We have a senior postdoc position available with @OATML_Oxford (closing 19/05) to lead work on LLM based causal reasoning with GSK. Please share with anyone you think this might be relevant to! my.corehr.com/pls/uoxrecruit…
We have a senior postdoc position available with @OATML_Oxford (closing 19/05) to lead work on LLM based causal reasoning with GSK. Please share with anyone you think this might be relevant to! my.corehr.com/pls/uoxrecruit…

Hot take: I think we just demonstrated the first AI agent computer worm 🤔 When an agent sees a trigger image it's instructed to execute malicious code and then share the image on social media to trigger other users' agents This is a chance to talk about agent security 👇
⚠️Beware: Your AI assistant could be hijacked just by encountering a malicious image online! Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] 🧵
Fundamental Limitations in Defending LLM Finetuning APIs Xander Davies (@alxndrdavies), Eric Winsor, @tomekkorbak, Alexandra Souly (@AlexandraSouly), Robert Kirk (@_robertkirk), Christian Schroeder de Witt (@casdewitt), @yaringal
⚠️Beware: Your AI assistant could be hijacked just by encountering a malicious image online! Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] 🧵
This is a great opportunity to join a really strong team - I've been working with this team very closely over the past year and a half, and would highly recommend the opportunity to join. Please share with people for whom you reckon this might be useful!
My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4