Michael Matthews
@mitrma
RS Intern @AIatMeta | PhD student @FLAIR_Ox | ex @UCL and @Cambridge_Uni | working on RL in open-ended environments
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵
You work on RL from pixels, and you're tired to wait 10 hours for a DMC run to finish? Or up to 100 hours, if you add video distractors? Well, we got you covered : PixelBrax can run your continuous control experiments from pixels in < 1 hr! Come chat with @trevormcinroe and I at…
A couple bits of news: 1. Happy to share my first (human) NetHack ascension-next step is RL agents :) 2. I wrote a post discussing some @NetHack_LE challenges & how they map to open problems in RL & agentic AI. Still the best RL benchmark imo. mikaelhenaff.substack.com/p/first-nethac…
Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: arxiv.org/abs/2506.04168 Thread ↓
Hello World: My team at FAIR / @metaai (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…
We are presenting Kinetix today! Oral - 11:30am Peridot Room 5F Poster - 3pm Hall 3+2B 377
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
🌹 Today we're releasing Unifloral, our new library for Offline Reinforcement Learning! We make research easy: ⚛️ Single-file 🤏 Minimal ⚡️ End-to-end Jax Best of all, we unify prior methods into one algorithm - a single hyperparameter space for research! ⤵️
I'll be in Singapore next week to present Kinetix as an Oral along with @mcbeukman. Reach out if you'd like to chat! 🇸🇬
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
I'll be attending ICLR next week to present Kinetix with @mitrma. Would love to chat about anything UED / Open-Ended RL / QD related, or interesting research in general :)
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
Did you know that \textcolor{white} text is still visible to LLMs? Anyway, don't use LLMs to write your reviews. Your co-authors will thank you.
Kinetix was featured on Computerphile!
Some time ago @computer_phile came to visit @FLAIR_Ox. I didn't have tons of time to prepare (too busy doing research!) so turning my verbiage into something semi coherent must have been difficult for the team. Here's the result -- judge for yourself: youtube.com/watch?v=fN3gdU…
Introducing M³: A 𝗠odular 𝗪orld 𝗠odel over streams of tokens for sample-efficient RL 🌍🤖 M³ achieves state-of-the-art performance for planning-free world models on Atari-100K 🕹️, DMC 🦾, and Craftax-1M! 🚀 🧵1/8
Jakob Foerster @j_foerst at @UniofOxford arguing that the AI community needs to avoid being goodharted by benchmarks.
1997: Deep Blue defeats Kasparov at chess 2016: AlphaGo masters the game of Go 2025: Stanford researchers crack Among Us Trending on alphaXiv 📈 Remarkable new work trains LLMs to master strategic social deduction through multi-agent RL, doubling win rates over standard RL.
⚔️ MiniHack Updates! ⚔️ 1️⃣ MiniHack 1.0.0 is here! Following popular demand, it now supports the new Gymnasium API and is built on NLE 1.1.0. Huge thanks to @Stephen_Oman (maintainer of @NetHack_LE ) for his outstanding contribution! 🙌
Kinetix has been accepted at ICLR as an Oral! See you in Singapore 🇸🇬
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
Congratulations to @antoine_dedieu @joeaortiz @sirbayes and the team for setting a new SOTA on Craftax-1M and Craftax-Classic-1M! 🎉
Happy to share our new preprint “Improving Transformer World Models for Data-Efficient RL”: arxiv.org/abs/2502.01591 We propose a ladder of improvements to model-based RL and achieve for the first time a superhuman reward on the challenging Craftax-classic benchmark! 1/10
Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first…
Couldn't agree more. "UK Research and Innovation funding in the UK fell under the previous government from 6,835 in 2018-19 to 4,900 in 2022-23". To give a concrete example (with my @UCLCS professor hat on): 4 out of 7 @UCL_DARK PhD students were funded by the Centre for Doctoral…
I'll also be at NeurIPS, keen to chat about UED, SFL, Kinetix, or anything in open-ended RL :)
Hello! I'll be at NeurIPS next week presenting our work on using learnability to select levels for RL autocurricula. If you're there, I would love to chat about curricula and RL generalisation more broadly. Please DM if you'd like to grab a coffee :)