Jon Richens
@jonathanrichens
Research scientist in AI safety @GoogleDeepMind
Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world modelsโฆ ๐งต

2 years ago, @ilyasut made a bold prediction that large neural networks are learning world models through text. Recently, a new paper by @GoogleDeepMind provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it mustโฆ
Can we trust a black-box system, when all we know is its past behaviour? ๐ค๐ค In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. ๐งต
What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* ๐งต
If I talk to one more person who says โbut even if this research direction led to a massive breakthrough in our scientific understanding of neural networks/deep learning/agent foundations, how would that help with AI safety?โ I will become the joker.
This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence... ๐ง๐ต๐ฒ ๐๐ฎ๐๐๐ฎ๐น ๐ฅ๐ฒ๐๐ผ๐น๐๐๐ถ๐ผ๐ป ๐ก๐ฒ๐ฒ๐ฑ๐ ๐ฆ๐ฐ๐ถ๐ฒ๐ป๐๐ถ๐ณ๐ถ๐ฐ ๐ฃ๐ฟ๐ฎ๐ด๐บ๐ฎ๐๐ถ๐๐บ Full text: arxiv.org/abs/2406.02275
Iโm hiring ambitious Research Scientists at @AnthropicAI to measure and prepare for models acting autonomously in the world. This is one of the most novel and difficult capabilities to measure, and critical for safety. Join the Frontier Red Team at Anthropic:โฆ
How should we understand A.I. agents? This blog by @tom4everitt provides one of the clearest and most complete accounts I've seen yet. Well worth checking out โ alongside the wider causality research agenda: alignmentforum.org/s/pcdHisDEGLbxโฆ