Aaditya Singh
@Aaditya6284
Doing a PhD @GatsbyUCL with @SaxeLab, @FelixHill84 on learning dynamics, ICL, LLMs. Prev. at: @GoogleDeepMind, @AIatMeta (LLaMa 3), @MIT. http://aadityasingh.github.io
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
9/N Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.
1/N I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
Come chat about this at the poster @icmlconf, 11:00-13:30 on Wednesday in the West Exhibition Hall #W-902!
How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention arxiv.org/abs/2501.16265 Led by Yedi Zhang with @Aaditya6284 and Peter Latham
Excited to present this work in Vancouver at #ICML2025 today 😀 Come by to hear about why in-context learning emerges and disappears: Talk: 10:30-10:45am, West Ballroom C Poster: 11am-1:30pm, East Exhibition Hall A-B # E-3409
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
👋 Attending #ICML2025 next week? Don't forget to check out work involving our researchers!
New Anthropic Research: Project Vend. We had Claude run a small shop in our office lunchroom. Here’s how it went.
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/
Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, and an extra thanks to my amazing collaborators for making this project so much fun to work on :)
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
some thoughts on human-ai relationships and how we're approaching them at openai it's a long blog post -- tl;dr we build models to serve people first. as more people feel increasingly connected to ai, we’re prioritizing research into how this impacts their emotional well-being.…
@ethansdyer and I have started a new team at @AnthropicAI — and we’re hiring! Our team is organized around the north star goal of building an AI scientist: a system capable of solving the long-term reasoning challenges and core capabilities needed to push the scientific…
We’re launching a research preview of Codex: a cloud-based software engineering agent that can work on many tasks in parallel. Rolling out to Pro, Enterprise, and Team users in ChatGPT starting today. chatgpt.com/codex
Some years ago, I got trapped in a Massive Trough of Imposter Syndrome. It took more than a year to dig myself out of it, but the following framework really helped me. It feels a bit vulnerable to share, but I hope it might help a few others too! A short thread 🧵🙂
We’re releasing PaperBench, a benchmark evaluating the ability of AI agents to replicate state-of-the-art AI research, as part of our Preparedness Framework. Agents must replicate top ICML 2024 papers, including understanding the paper, writing code, and executing experiments.