Stephanie Chan
@scychan_brains
Staff Research Scientist at Google DeepMind. Artificial & biological brains 🤖 🧠 Views are my own.
Check out our new work: Generalization from context often outperforms generalization from finetuning. And you might get the best of both worlds by spending extra compute at train-time.
How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/
This paper is absolutely essential reading for anyone interested in developing a science of AI safety and evaluation. I esp. appreciate the “principle of parsimony”: Behaviours should not be attributed to complex mental processes if simpler explanations are available ✅
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
Quick thread on the recent IMO results and the relationship between symbol manipulation, reasoning, and intelligence in machines and humans:
🚨 NEW PAPER: “Democratic AI is Possible: The Democracy Levels Framework Shows How It Might Work” #icml2025 AI is reshaping our world. How should we steer its development? We introduce the Democracy Levels Framework to define concrete milestones toward meaningfully…
New ideas for our information ecosystem
🚨🚨 Excited to share a new paper led by @Li_Haiwen_ with the @CommunityNotes team! LLMs will reshape the information ecosystem. Community Notes offers a promising model for keeping human judgment central but it's an open question how to best integrate LLMs. Thread👇
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
Everyone’s hyped about test-time scaling—more steps, longer traces, just add “Wait” or “Let me rethink,” and boom: better reasoning? Not quite. We find that performance almost always improves at first—then declines. Classic overthinking. That’s not news. But why does it happen?…
🔥 Does test-time scaling in #reasoningmodels via thinking more always help? 🚫 Answer is No - Performance increases first and then drops due to #Overthinking ❓Why is this behaviour and how to mitigate 🚀 Check our recent findings #LLMReasoning Link: arxiv.org/pdf/2506.04210
While I'm promoting @summerfieldlab's work anyway.. I highly recommend this paper on both the risks and opportunities of AI for democracy: arxiv.org/abs/2409.06729 It's exactly the kind of balanced analysis we need more of, but which unfortunately does not gain as much attention…
Important lessons on rigorous evaluation of AI model behaviors. Drawing on the historical example (and fun story) of hype around "chimps learning language". Given the importance of AI safety research, rigor and credibility is absolutely necessary. A great read from the folks at…
Important lessons on rigorous evaluation of AI model behaviors. Drawing on the historical example (and fun story) of hype around "chimps learning language". Given the importance of AI safety research, rigor and credibility is absolutely necessary. A great read from the folks at…
In a new paper, we examine recent claims that AI systems have been observed ‘scheming’, or making strategic attempts to mislead humans. We argue that to test these claims properly, more rigorous methods are needed.
Do you have a PhD (or equivalent) or will have one in the coming months (i.e. 2-3 months away from graduating)? Do you want to help build open-ended agents that help humans do humans things better, rather than replace them? We're hiring 1-2 Research Scientists! Check the 🧵👇
I have been waiting for this to be announced, it’s so amazing to see such elegant scaling of the Deep Think system where the same system can now achieve a gold at IMO! deepmind.google/discover/blog/…
We’re hiring a sociological research scientist @GoogleDeepMind! Work with the inimitable @KLdivergence, @weidingerlaura, @iamtrask, @canfer_akbulut, Julia Haas & many others 🙌
I'm hiring! job-boards.greenhouse.io/deepmind/jobs/…
Excited to present this work in Vancouver at #ICML2025 today 😀 Come by to hear about why in-context learning emerges and disappears: Talk: 10:30-10:45am, West Ballroom C Poster: 11am-1:30pm, East Exhibition Hall A-B # E-3409
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
An important line of research -- understanding complementarity between humans and AIs
How do we ensure humans can still effectively oversee increasingly powerful AI systems? In our blog, we argue that achieving Human-AI complementarity is an underexplored yet vital piece of this puzzle! And, it’s hard, but we achieved it. 🧵(1/10)
We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments - without needing a constant internet connection. 🧵
I wonder if LLM sycophancy is actually also second-order.. We know the responses are a reflection of us (writing style etc), and we like what we see, and that makes us feel good about ourselves too
Super happy and proud to share our novel scalable RNN model - the MesaNet! This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.