Yoram Bachrach
@yorambac
Research Scientist at Meta (prev Google DeepMind and Microsoft Research). Working on LLM Agents and Multi-Agent Systems.
Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the…
I just published the story of how I created the world’s first No-Limit Holdem poker solver and made $500k by age 23 medium.com/@olegostroumov… I had to keep the story secret since 2013, but now you can read how I went from near broke to reshaping world's toughest poker games
Hiring! We're looking to fill contractor Research Engineer roles in New York City to work with us in FAIR on AI Research Agents. If that sounds fun, please fill out the expression of interest here: forms.gle/7m4fVqLXY5GwuL…
📢We show that continuous latent reasoning has a theoretical advantage over discrete token reasoning (arxiv.org/abs/2505.12514): For a graph with n vertices and graph diameter D, a two-layer transformer with D steps of continuous CoTs can solve the directed graph reachability…
Excited to release AlgoTune!! It's a benchmark and coding agent for optimizing the runtime of numerical code 🚀 algotune.io 📚 algotune.io/paper.pdf 🤖 github.com/oripress/AlgoT… with @OfirPress @ori_press @PatrickKidger @b_stellato @ArmanZharmagam1 & many others 🧵
AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

Our research on embodied AI agents that can perceive, learn, act and interact in the virtual and physical worlds. #metaAI #AIAgent #embodied #worldmodel #superintelligemce arxiv.org/abs/2506.22355
Love this project: nanoGPT -> recursive self-improvement benchmark. Good old nanoGPT keeps on giving and surprising :) - First I wrote it as a small little repo to teach people the basics of training GPTs. - Then it became a target and baseline for my port to direct C/CUDA…
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
This project was co-led by @BingChenZhao2, @MarlaMagka and myself, with the support of a tremendous team under @yorambac and @j_foerst. Read the full paper detailing the benchmark design and our findings here: arxiv.org/abs/2506.22419
Recently, there has been a lot of talk of LLM agents automating ML research itself. If Llama 5 can create Llama 6, then surely the singularity is just around the corner. How can we get a pulse check on whether current LLMs are capable of driving this kind of total…
🚨Self-Challenging Language Model Agents🚨 📝: arxiv.org/abs/2506.01716 A new paradigm to train LLM agents to use different tools with challenging self-generated data ONLY: Self-challenging agents (SCA) both propose new tasks and solve them, using self-generated verifiers to…
Hello World: My team at FAIR / @metaai (AI Research Agent) is looking to hire contractors across software engineering and ML. If you are interested and based in the UK, please fill in the following short EoI form: docs.google.com/forms/d/e/1FAI…
Come join us! We have a crack team across US + UK (@yorambac) working on agents that can do AI research. We're hiring a full-time PhD new grad Research Scientist based in New York. Ideal candidate has published on RL / reasoning with LLMs.
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution arxiv.org/abs/2502.18449 by @YuxiangWei9 @sidawxyz and the whole team! Get started with your favorite model here github.com/facebookresear…
Tired of using FID for evaluating generative models? Come to our #NeurIPS2023 poster on FLS, a new complete metric for generative models that also penalizes overfitting! neurips.cc/virtual/2023/p… github.com/marcojira/fls @bose_joey @drimgemp Chongli Qin @yorambac @gauthier_gidel
How can metrics for evaluating generative models take into account generalization? In our new paper, we propose a new sample-based metric to address exactly this challenge: the Feature Likelihood Score (FLS). Paper: arxiv.org/abs/2302.04440 Github: github.com/marcojira/fls 1/12
What do haggling, debate, and convincing your kids to go to bed all have in common with Poker? With #LLMs, we map them all onto the framework of #gametheory; we then generate conversational strategies using the same methods that beat top Poker pros. arxiv.org/abs/2402.01704
Student researcher positions at @GoogleDeepMind are now open for applications until Dec 15 – see our careers webpage. Also a good opportunity to re-share my article of how I prepared for my internship back in 2019: davidstutz.de/how-i-prepared…
⚽🌐🕸️🤖 arXiv:2310.10553 arxiv.org/abs/2310.10553