Alexis Bellot
@alexis_bellot_
AI Researcher @GoogleDeepMind
Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔 In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵

2 years ago, @ilyasut made a bold prediction that large neural networks are learning world models through text. Recently, a new paper by @GoogleDeepMind provided a compelling insight to this idea. They found that if an AI agent can tackle complex, long-horizon tasks, it must…
Are world models necessary to achieve human-level agents, or is there a model-free short-cut? Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
What if LLMs are sometimes capable of doing a task but don't try hard enough to do it? In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵