Adam Fisch (@adamjfisch)

Pinned

A

Adam Fisch@adamjfisch · Jun 13

You need to evaluate an AI system and you have three things: 1. A cheap judge, which is noisy. 🙈 2. An expensive judge, which is accurate. 🧑‍⚖️ 3. A budget 💸 How should you spend the budget to get the best possible estimate of model quality? arxiv.org/abs/2506.07949

3

11

83

69

13.0K

Adam Fisch Retweeted

S

Sangmin Bae@raymin0223 · Jul 24

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mix… Paper: arxiv.org/abs/2507.10524

6

60

263

161

15.0K

Adam Fisch Retweeted

S

Sangmin Bae@raymin0223 · Jul 24

🏋️‍♂️This unified MoR framework has very good performance and faster speeds. Check it out and ask any questions! Huge thanks to my awesome co-authors: @yujin301300 @reza_byt @kim_sungnyun @jenhriver @TalSchuster @adamjfisch @harhrayr Ziwei Ji @AaronCourville Se-Young Yun! 🥰

1

6

0

300

Adam Fisch Retweeted

Y

Yujin Kim@yujin301300 · Jul 21

Huge thanks ❤️ to my awesome co-first authors @raymin0223 and @reza_byt, and to all our collaborators and supervisors who made this possible: @kim_sungnyun , @jenhriver, @TalSchuster, @adamjfisch, @harhrayr, Ziwei Ji, @AaronCourville, and Se-Young Yun.

0

2

9

0

530

Adam Fisch Retweeted

Y

Yujin Kim@yujin301300 · Jul 21

Introducing our new work: 🚀Mixture-of-Recursions! 🪄We propose a novel framework that dynamically allocates recursion depth per token. 🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!

9

59

328

217

21.0K

Adam Fisch Retweeted

D

Deedy@deedydas · Jul 16

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

77

453

3.0K

2.0K

242.0K

Adam Fisch Retweeted

R

Reza Bayat@reza_byt · Jul 16

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput.…

2

55

237

146

22.0K

A

Adam Fisch@adamjfisch · Jul 16

Thanks for sharing our work, @deedydas MoR is a new arch that upgrades Recursive Transformers and Early-Exiting algorithms. Simple pretraining with router, and faster inference speed and lower KV caches! Post for details and codes will be released very soon. Stay tuned! ☺️

DDeedy@deedydas · Jul 16

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

0

12

44

9

3.0K

A

Adam Fisch@adamjfisch · Jul 8

Accepted to COLM 2025!

JJonathan Berant@JonathanBerant · Mar 20

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

1

19

5

2.0K

Adam Fisch Retweeted

J

Jonathan Berant@JonathanBerant · Mar 20

Hi ho! New work: arxiv.org/pdf/2503.14481 With amazing collabs @jacobeisenstein @jdjdhekchbdjd @adamjfisch @ddua17 @fantinehuot @mlapata @vicky_zayats Some things are easier to learn in a social setting. We show agents can learn to faithfully express their beliefs (along... 1/3

2

20

73

38

9.0K

A

Adam Fisch@adamjfisch · Nov 27

Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers using Prediction-Powered Inference to incorporate synthetic data and model predictions for narrower CIs. 👇 Gemini already knows about them!…

AAnthropic@AnthropicAI · Nov 19

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…

2

6

54

45

14.0K

Adam Fisch Retweeted

A

Anastasios Nikolas Angelopoulos@ml_angelopoulos · Nov 19

🚨 New Textbook on Conformal Prediction 🚨 arxiv.org/abs/2411.11824 “The goal of this book is to teach the reader about the fundamental technical arguments that arise when researching conformal prediction and related questions in distribution-free inference. Many of these…

12

90

426

242

44.0K

A

Adam Fisch@adamjfisch · Oct 30

Checkout our new paper on Recursive Transformers. Great having Sangmin here at @GoogleDeepMind to lead it! Particularly excited about the potential for continuous depth wise batching for much better early-exiting batch throughout.

SSangmin Bae@raymin0223 · Oct 29

🚀 Excited to share our latest research @GoogleDeepMind on ♻️Recursive Transformers! We make smaller LMs by "sharing parameters" across layers. A novel serving paradigm, ✨Continuous Depth-wise Batching, with 🏃Early-Exiting could significantly boost their decoding speed! 🧵👇

2

4

31

8

4.0K

Adam Fisch Retweeted

A

Aviral Kumar@aviral_kumar2 · Oct 15

This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag, @adamjfisch, @younggeng, @jacobeisenstein, @agarwl_, Alekh Agarwal, and @JonathanBerant.

1

2

7

1

1.0K