Roberta Raileanu
@robertarail
Senior Staff Research Scientist @GoogleDeepMind & Honorary Lecturer @UCL. ex @Meta|@MSFTResearch|@NYU|@Princeton. Llama-3, Toolformer, Rainbow Teaming, MLGym.
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
Come work with us on helping agents reach unprecedented levels of autonomy :) We have a ridiculously cracked team and are pushing the bounds of AGI with awesome folks like @robertarail @OriolVinyalsML @_rockt @quocleix Come join ~~ le rocket ship ~~
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
MLGym has been accepted to #COLM2025! See you in Montreal 🇨🇦
🎉 Thrilled to share MLGym and MLGym-Bench, our new framework for AI Research Agents! 🚀 Developed during my Meta internship, MLGym provides a flexible environment for benchmarking and developing new agents for AI research tasks. 🔬 MLGym-Bench consists of 13 diverse AI research…
Our team @GoogleDeepMind is hiring! Join a team of world-class researchers working on open-ended self-improvement! 🔥
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
If you are interested in open-ended discovery, this is an amazing opportunity! @robertarail is great and you will have fun working on a challenging problem
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an…
Join us at the Computer Use Agents workshop at ICML2025. Happening now in West Meeting Room 211, Vancouver Convention Centre! We have a day packed with fantastic invited and contributed talks, posters, and discussions!
How should we rank generalist agents on a wide set of benchmarks and tasks? Honored to get the AAMAS best paper award for SCO, a scheme based on voting theory which minimizes the mistakes in predicting agent comparisons based on the evaluation data. arxiv.org/abs/2411.00119
Really excited to share our recent work combining open-ended foundation model innovation with the compeititive dynamics of self-play!! One of the ingredients towards creativity explosion? Led by @_aadharna, done together with @jeffclune 🚀
Thrilled to introduce Foundation Model Self-Play, led by @_aadharna. FMSPs combine the intelligence & code generation of foundation models with the curriculum of self-play & principles of open-endedness to explore diverse strategies in multi-agent games, like the one below 🧵👇
AIRA strikes again! This time we conduct an in-depth study of research agents on MLE-Bench (i.e. kaggle competitions). We find that while exploration and search matter, the biggest delta is due to our more robust software stack. We are open-sourcing all of this to allow YOU to…
AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench
AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench
Scaling AI research agents is key to tackling some of the toughest challenges in the field. But what's required to scale effectively? It turns out that simply throwing more compute at the problem isn't enough. We break down an agent into four fundamental components that shape…
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
My team is hiring a Technical Program Manager to help organize, accelerate, and empower world-class research. High-impact, high-growth role for someone passionate about AI and great at making things happen.
We are hiring Technical Program Manager to organize and enable our research teams to be the best at what they do and to make fast-paced progress towards our mission of bringing AGI responsibly. Ideal candidates should have a demonstrable record of strong program management…
Excellent work from @Dahoas1 & co. on generating both problems and solutions to improve LLM reasoning, with lots of interesting insights. Great to see more work using open-ended methods for self-improvement 🚀
Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out
We are excited to host @robertarail at the UCL Jump Trading/ELLIS CSML Seminar Series @uclcsml @ai_ucl @uclcs tomorrow 12pm UK time. Title: “Automating Scientific Discovery: How Far Are We?” Details and zoom link: ucl-ellis.github.io/dm_csml_semina… Please sign up to join in person 👇
Hello Gemini 2.5 Flash-Lite! So fast, it codes *each screen* on the fly (Neural OS concept 👇). The frontier isn't always about large models and beating benchmarks. In this case, a super fast & good model can unlock drastic use cases. Read more: blog.google/products/gemin…
📢 New paper introducing LLM-First Search (LFS) — a new self-guided search method with LLMs 🤖 🧠 🚀. Excellent work led by @naitherr with @_rockt from @UCL_DARK. LFS can more effectively explore the solution space of challenging reasoning problems, leading to better performance…
Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed…