Adam Fisch
@adamjfisch
Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.
You need to evaluate an AI system and you have three things: 1. A cheap judge, which is noisy. 🙈 2. An expensive judge, which is accurate. 🧑⚖️ 3. A budget 💸 How should you spend the budget to get the best possible estimate of model quality? arxiv.org/abs/2506.07949
✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mix… Paper: arxiv.org/abs/2507.10524
🏋️♂️This unified MoR framework has very good performance and faster speeds. Check it out and ask any questions! Huge thanks to my awesome co-authors: @yujin301300 @reza_byt @kim_sungnyun @jenhriver @TalSchuster @adamjfisch @harhrayr Ziwei Ji @AaronCourville Se-Young Yun! 🥰
Huge thanks ❤️ to my awesome co-first authors @raymin0223 and @reza_byt, and to all our collaborators and supervisors who made this possible: @kim_sungnyun , @jenhriver, @TalSchuster, @adamjfisch, @harhrayr, Ziwei Ji, @AaronCourville, and Se-Young Yun.
Introducing our new work: 🚀Mixture-of-Recursions! 🪄We propose a novel framework that dynamically allocates recursion depth per token. 🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!
Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.
📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.…
Thanks for sharing our work, @deedydas MoR is a new arch that upgrades Recursive Transformers and Early-Exiting algorithms. Simple pretraining with router, and faster inference speed and lower KV caches! Post for details and codes will be released very soon. Stay tuned! ☺️
Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.
Accepted to COLM 2025!
Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3
Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers using Prediction-Powered Inference to incorporate synthetic data and model predictions for narrower CIs. 👇 Gemini already knows about them!…
New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…
🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these…
Checkout our new paper on Recursive Transformers. Great having Sangmin here at @GoogleDeepMind to lead it! Particularly excited about the potential for continuous depth wise batching for much better early-exiting batch throughout.
🚀 Excited to share our latest research @GoogleDeepMind on ♻️Recursive Transformers! We make smaller LMs by "sharing parameters" across layers. A novel serving paradigm, ✨Continuous Depth-wise Batching, with 🏃Early-Exiting could significantly boost their decoding speed! 🧵👇
This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag, @adamjfisch, @younggeng, @jacobeisenstein, @agarwl_, Alekh Agarwal, and @JonathanBerant.