Alexander Ku
@alex_y_ku
Cognitive scientist and AI researcher at @GoogleDeepMind and @Princeton
(1/11) Evolutionary biology offers powerful lens into Transformers learning dynamics! Two learning modes in Transformers (in-weights & in-context) mirror adaptive strategies in evolution. Crucially, environmental predictability shapes both systems similarly.

Claude Opus 4 and Sonnet 4 are the best coding models, setting new records across the board. 🚀 We are pushing the limits (80.2% on SWE-Bench!!), advancing the frontier while keeping up the momentum. The benchmarks may soon become saturated but the capabilities will not!
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
So excited our paper is now out in @CognitionJourn! Huge thanks to our editor and reviewers 🧠 Their thoughtful suggestions inspired Experiments 3 & 4, including a striking inverse correlation between idleness judgments and speed-up predictions
“People Evaluate Idle Collaborators Based on their Impact on Task Efficiency” 📢 New from: Elizabeth Mieczkowski, Cameron Rouse Turner, Natalia Vélez, & Tom Griffiths sciencedirect.com/science/articl… TL;DR: Sometimes it's acceptable not to help with group work 🧵👇
Excited to share our new paper on neural networks learning base addition. We found that, if they use the right symmetries, even simple neural networks can achieve radical generalization, and that learnability is closely correlated with the symmetry used. 🧵
New review on computational approaches to studying human planning out now in @TrendsCognSci! Really enjoyed having the opportunity to write something broader about the field with the help of @evanrussek @marcelomattar @weijima01 and @cocosci_lab cell.com/trends/cogniti…
🤖 Household robots are becoming physically viable. But interacting with people in the home requires handling unseen, unconstrained, dynamic preferences, not just a complex physical domain. We introduce ROSETTA: a method to generate reward for such preferences cheaply. 🧵⬇️
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention arxiv.org/abs/2501.16265 Led by Yedi Zhang with @Aaditya6284 and Peter Latham
🧵What if emergence could be explained by learning a specific circuit: sparse attention? Our new work explores this bold hypothesis, showing a link between emergence and sparse attention that reveals how data properties influence when emergence occurs during training.
Are you considering attending @cogsci_soc this year? Come to our workshop, 'Reasoning Across Minds and Machines', which features an exciting lineup of interdisciplinary research in AI and CogSci about reasoning. The workshop is W2 and visible along with the main conference.