Tom Everitt
@tom4everitt
AGI safety researcher at @GoogleDeepMind, leading http://causalincentives.com switching to https://bsky.app/profile/tom4everitt.bsky.social
SoTA’s Human Augmentation Hackathon, 26-27th July We’re seeking new tools that foster human-AI symbiosis. What you build will expand our capabilities, rather than writing us out of the story. Success will be judged against the following criteria: - Is your demo better than…
pretty cool effort to distribute economic power in the age of powerful AI
With @luke_drago_, I’m cofounding Workshop Labs, a public benefit corporation preventing human disempowerment from AI. See below for: -impact case -what we’re building -what we hope the future looks like -what we’re hiring for
Someone needs to use this as the basis of an unsupervised environment design algorithm to give AI designers direct control over agent behavior
Causality is about predicting how interventions affect outcomes. Can we use causality to predict how environment changes affect agent behavior? We explore this idea in a new paper
Causality is about predicting how interventions affect outcomes. Can we use causality to predict how environment changes affect agent behavior? We explore this idea in a new paper
Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔 In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵
In real-life, agents with different subjective beliefs interact in a shared objective reality. They have higher-order beliefs about each other's beliefs and goals, which is required for phenomena involving theory-of-mind, like deception Our paper formalises this in causal models
One thing that I really like about this is that my content is much less determined by who I follow, than by which posts I like. This means I can express my approval for a post, without worrying that similar content will now flood my feed.
Instead, there's a market place of content selection algorithms. My favourites are * "Following": simple chronological feed (default) * "Quiet posters": posts from less frequent posters in your feed * "Paper Skygest": posts about papers