Mikayel Samvelyan
@_samvelyan
Research Scientist @GoogleDeepMind. Previously @Meta (FAIR), PhD @UCL, MSc @UniofOxford. @ELLISforEurope member.
Excited to give an invited talk on Agent Learning in Open-Endedness at the @IMOLNeurIPS2024 workshop this Sunday. I'll be joining an amazing lineup of speakers. Hope to see you there! 📅 Sunday, Dec 15 🕚 11:35 - 12:15 📍 West Meeting Room 217-219
We're at #NeurIPS🇨🇦! Check out our updated Sunday schedule: imol-workshop.github.io/pages/program/
An exceptional opportunity with brilliant @robertarail and an amazing team at @GoogleDeepMind! 🚀 If pushing the frontiers of open-ended discovery excites you, this is the place to be. 🔥
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!
We’re excited to announce our next speaker: Roberta Raileanu (@robertarail) from @GoogleDeepMind! Roberta will discuss NetHack: A Grand Challenge for RL and LLM Agents Alike. ⚔️ Join us on August 5th to learn how to develop agents capable of tackling open-ended environments!
Much-needed multi-agent benchmark for LLMs 👥 Theory of Mind is key as LLMs act in agentic, interactive settings — yet remains underexplored and hard to measure. 💽 Decrypto offers an ToM-based evaluation of reasoning for agents operating in complex social settings. Great work!
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
Happy "@NetHack_LE is still completely unsolved" day for those of you who are celebrating it. We released The NetHack Learning Environment (arxiv.org/abs/2006.13760) on this day five years ago. Current frontier models achieve only ~1.7% progression (see balrogai.com).…
Check out Alex’s amazing internship project using Quality-Diversity algorithms to create synthetic reasoning problems! 👇 💡Key takeaway: better data quality improves in-distribution results, while more diversity enhances out-of-distribution generalization.
Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out
Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed…
🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀 LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by @anyaasims! [1/]
What an enormous privilege to give the opening lecture at the OxML summer school this morning. Never have I had such a thought-provoking set of audience questions! Here's to the automation of innovation towards human flourishing alongside the next generation of researchers.
📣 We’re excited to kick off the course today with a fantastic line-up of speakers: Edward Hughes (Google DeepMind) – AI Squared: Towards AI Capable of AI Research Karo Moilanen (Moonsong Labs)– Agent Guardrails and Proof-of-Agenthood Topologies Peter Gostev(Moonpig) –…
Schmidhuber's Gödel Machine: AI "rewriting its code" if provably useful showed the dream of recursive self-improvement 🔄 Thrilled to share our practical realization, inspired by Darwinian evolution! Done with the amazing @jennyzhangzt, @shengranhu, @RobertTLange @jeffclune 😍
Introducing The Darwin Gödel Machine: AI that improves itself by rewriting its own code sakana.ai/dgm The Darwin Gödel Machine (DGM) is a self-improving agent that can modify its own code. Inspired by evolution, we maintain an expanding lineage of agent variants,…
One promising direction is combining ideas from AlphaEvolve and the Darwin Gödel Machine. Imagine a self-referential system improving itself even at the lowest algorithmic levels at *scale* AlphaEvolve: deepmind.google/discover/blog/… Darwin Gödel Machine: arxiv.org/abs/2505.22954
Proud to announce that Dr @akbirkhan defended his PhD thesis titled "Safe Automated Research" last week 🥳. Massive thanks to @mpshanahan and Pontus Stenetorp for examining! As is customary, Akbir received a personal mortarboard from @UCL_DARK. Details 👇
2025 is the year of open-endedness. Delighted to be giving a talk at RAAIS in a couple of weeks’ time!
"open-endedness is all we'll need"...this is the study of a system’s ability to continuously generate artifacts that are both novel and learnable to an observer as a route to agi. excited to have @edwardfhughes from @GoogleDeepMind's open-endedness team join us at @raais 2025!
"open-endedness is all we'll need"...this is the study of a system’s ability to continuously generate artifacts that are both novel and learnable to an observer as a route to agi. excited to have @edwardfhughes from @GoogleDeepMind's open-endedness team join us at @raais 2025!
The bar AI has yet to reach.
Happy to announce the latest release of @NetHack_LE (version 1.2.0). You can now use the seed function to make the dungeon layout reproducible across training episodes. The in-level interaction and combat is still randomly determined and doesn't impact lower level layouts.
Google's AI just made math discoveries NO human has! —Solved optimal packing of 11 and 12 hexagons in hexagons. —Reduced 4x4 matrix multiplication from 49 operations to 48 (first advance in 56 years!) and many more. AlphaEvolve is the AlphaGo 'move 37' moment for math. Insane.
Huge congratulations to my academic sister Laura on getting a postdoc position at MIT! 🧠✨ So proud of everything she’s achieved — can’t wait to see all the amazing things she’ll do there. 🚀
Excited to announce that this fall I'll be joining @jacobandreas's amazing lab at MIT for a postdoc to work on interp. for reasoning (with @ev_fedorenko 🤯 among others). Cannot wait to think more about this direction in such a dream academic context!
I always took the bitter lesson to mean "only work on whatever scales", not to never be in the loop
It ain't "The Bitter Lesson" if you are in the loop curating the training data for your LLM, y'all.. Pick your lesson, will ya? #SundayHarangue (h/t @kayastechly)
Our @UCL_DARK MSc student @_yixu managed to get his work accepted as a spotlight paper at @icmlconf 2025 (top 2.6% submissions) 🚀 What an amazing success testament to the outstanding supervision by @_robertkirk and @LauraRuis.
🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄