Jesse Farebrother
@JesseFarebro
Ph.D. student studying AI & decision making at @Mila_Quebec / @McGillU. Currently at @AIatMeta. Previously @GoogleDeepMind, @Google 🧠.
Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…
3) At the World Models workshop, I'll be giving an oral on a new approach to learning a generative model of successor states through flow matching / diffusion. 📍Peridot 201 & 206 📅Mon 28 Apr 5 PM - 5:30 PM Check out the paper on arXiv: arxiv.org/abs/2503.09817 with a full…
Turns out @hugo_larochelle was ahead of his time once again with these cards existing in the wild 😄 thestar.com/opinion/contri…
did someone say AI researcher baseball cards 👀
As an undergraduate student, taking @RichardSSutton’s course at @UAlberta was a defining moment in my academic journey. His work and teachings have shaped the paths of countless researchers, including my own. Congrats, Rich & Andy!
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD
Curious about a simple and scalable approach to multi-turn code generation? Come check out μCode — our framework built on one-step recoverability and multi-turn BoN search. Stop by and say hi during Poster Session 4 at #ICML2025 today at East Hall A-B # E-2600.
Coding agents can debug their own outputs, but what if none of the fixes are correct? We overcome sparse rewards by making them continuous📈 Instead of having binary execution rewards, we introduce a learned verifier to measure how close the current solution is to a correct one📏
Heading to Vancouver for #ICML2025 to present our work: Temporal Difference Flows. Make sure to check out the oral to learn how we’re now able to scale this exciting world model framework based on the successor representation! Also, feel free to reach out to discuss anything RL!
Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…
Late update: I’ve moved to the Bay Area for a 6-month research fellowship at @AnthropicAI ! I’d be glad to meet other researchers working on RL for language models, agents, subtle and unverifiable rewards, etc. — DMs open.
Exciting PhD position open at FAIR in Paris. We are looking for a candidate to join our team and contribute to advancing the field of AI, especially reinforcement learning. Find more details and apply below. Feel free to reach out to me by email. metacareers.com/jobs/192266079…
Say ahoy to 𝚂𝙰𝙸𝙻𝙾𝚁⛵: a new paradigm of *learning to search* from demonstrations, enabling test-time reasoning about how to recover from mistakes w/o any additional human feedback! 𝚂𝙰𝙸𝙻𝙾𝚁 ⛵ out-performs Diffusion Policies trained via behavioral cloning on 5-10x data!
Take a look at this amazing piece of work by my student @JesseFarebro - a new kind of world model based on successor representations that's a lot more robust than prior iterations. Incredible to see all the progress we've made in the last 5 years in RL.
Honored that our paper Temporal Difference Flows received the Best Paper Award at the #ICLR2025 World Models Workshop, and has also been accepted as a spotlight for #ICML2025! All made possible with the exceptional team @AIatMeta! 📄arxiv.org/abs/2503.09817 x.com/JesseFarebro/s…
📢 Come say hi at our SFM poster at #ICLR2025, Poster Session 5 – #572! We’re presenting a method for Inverse Reinforcement Learning via Successor Feature Matching — a non-adversarial approach that works without action labels. Excited to share and chat!
🚀Excited to share SFM– a method for IRL by direct policy optimization through a successor feature matching loss. Incredible collaboration with @harwiltz, @JesseFarebro, @irinarish, @GlenBerseth, and @sanjibac. Paper: arxiv.org/abs/2411.07007 Code: github.com/arnavkj1995/SFM 🧵⬇️
At Reliant we've found RL to be incredibly efficient at improving answer quality to life sciences' hardest questions. Today we're putting out our work on LLM fine-tuning with off-policy RL, matching llama 70B performance with an 8B model - take a look! arxiv.org/abs/2503.14286
Don’t miss this amazing opportunity to work with Pablo at @GoogleDeepMind—one of the highlights of my PhD. He’s an incredible mentor, and I can’t say enough good things about working with him!
Looking to hire a student researcher to work on cool project for 6 months in DeepMind Montreal. Reqs: - Full-time masters/PhD student 🧑🏾🎓 - Substantial expertise in multi-agent RL, ideally including publication(s) 🤖🤖 - Strong Python coding skills 🐍 This you? Get in touch!
📣📣 My team at Google DeepMind is hiring a student researcher for summer/fall 2025 in Seattle! If you're a PhD student interested in getting deep RL to (finally) work reliably in interesting domains, apply at the link below and reach out to me via email so I know you aplied👇
If you are interested in working with me at *the* RL powerhouse @UAlberta on robot learning on physical robots, please drop me a message. Retweets welcome 🙏
Going through my BSc and MSc studies in Brazil I would hear about Turing Award winners. Those were not real people to me, they were mythological figures so far from me. Now Rich has won it! Thank you, Rich. You have no idea how meaningful this is to me. nytimes.com/2025/03/05/tec…