Rishabh Agarwal

@agarwl_

Reinforcement Learner @AIatMeta, Adjunct Prof at McGill. Ex DeepMind, Brain, Mila, IIT Bombay. NeurIPS Best Paper

Montréal, Canada

Joined May 2016

767Following

11KFollowers

Pinned

Rishabh Agarwal@agarwl_ · Feb 7

I recently gave a tutorial on knowledge distillation for LLMs, explaining the mathematical derivations behind the commonly used methods. Sharing the slides here given the recent interest in this topic. drive.google.com/file/d/1xMohjQ…

agarwl_'s tweet image. I recently gave a tutorial on knowledge distillation for LLMs, explaining the mathematical derivations behind the commonly used methods. Sharing the slides here given the recent interest in this topic.

drive.google.com/file/d/1xMohjQ…

179

1.0K

117.0K

Pinned

Rishabh Agarwal Retweeted

Amrith Setlur@setlur_amrith · Jun 24

Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over correct responses sampled by the pretrained LLM OR is it exploring and discovering new strategies 🤔? Find answers in our latest post ⬇️ tinyurl.com/rlshadis

149

116

11.0K

Rishabh Agarwal Retweeted

Google DeepMind@GoogleDeepMind · Jul 21

An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International Mathematical Olympiad. 🥇 It solved 5️⃣ out of 6️⃣ exceptionally difficult problems, involving algebra, combinatorics, geometry and number theory. Here’s how 🧵

153

782

4.0K

684

1.0M

Rishabh Agarwal Retweeted

Yong Zheng-Xin (Yong)@yong_zhengxin · Jul 11

I wrote up this post about how we should **unify RL and next-token-prediction** based on my perspective how humans learn new languages. then realize @jxmnop wrote the exact same thing about how we should scale RL to 10^26 FLOPs

442

542

40.0K

Rishabh Agarwal@agarwl_ · Jul 11

Kimi K2 is here! The first big beautiful model purpose-built for agentic capabilities is now open-source! Agent RL, ready for takeoff!

KKimi.ai@Kimi_Moonshot · Jul 11

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

125

7.0K

Rishabh Agarwal@agarwl_ · Jul 8

The age of transformers is ending...the dawn of linear-cost architectures is upon us. Power Attention replaces Flash Attention in any transformer, and removes the quadratic penalty of context scaling while achieving strong performance. The result: domination of both transformers…

MManifest AI@manifest__ai · Jul 8

Releasing Power Attention: manifestai.com/articles/relea…

111

110

14.0K

Rishabh Agarwal Retweeted

George Wing@george__wing · Jun 28

youtu.be/E22AOHAEtu4?si… Great talk. Thanks @shuchaobi for delivering it, and @CUSEAS for uploading it.

69.0K

Rishabh Agarwal Retweeted

Aurko Roy@aurko79 · Jul 4

Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-simplicial attention*—a tri-linear generalization of dot-product attention - We show how to adapt RoPE to tri-linear forms - We show 2-simplicial attention scales…

804

512

132.0K

Rishabh Agarwal Retweeted

Aleksandra Faust@AleksandraFaust · Jul 4

Join my team at @genesistxai ! 🧬 We're forging AI foundation models to unlock groundbreaking therapies for patients with severe diseases. We're hiring ML Scientists, Engineers, TPMs & Interns in foundation models, #LLMs , #RL, #diffusion models, and other cutting-edge areas of…

110

12.0K

Rishabh Agarwal Retweeted

MiniMax (official)@MiniMax__AI · Jun 16

Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning. - World’s longest context window: 1M-token input, 80k-token output - State-of-the-art agentic use among open-source models - RL at unmatched efficiency:…

306

1.0K

673

1.8M

Rishabh Agarwal Retweeted

Amrith Setlur@setlur_amrith · Jun 13

Introducing e3 🔥 Best <2B model on math 💪 Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️ 🚨 arxiv.org/abs/2506.09026 🚨 matthewyryang.github.io/e3/

11.0K

Rishabh Agarwal@agarwl_ · Jun 13

👉 New preprint on a new family of Transformer-type models whose depth scales logarithmically with sequence length. Enables: - fast training - fast decoding - large memory capacity in associative recall - strong length generalization on state tracking

MMorris Yau@MorrisYau · Jun 13

Transformers: ⚡️fast to train (compute-bound), 🐌slow to decode (memory-bound). Can Transformers be optimal in both? Yes! By exploiting sequential-parallel duality. We introduce Transformer-PSM with constant time per token decode. 🧐 arxiv.org/pdf/2506.10918

14.0K

Rishabh Agarwal Retweeted

rohan anil@_arohan_ · Jun 13

Last day today @AIatMeta, reflecting on last several months, and wanted to highlight few things I enjoyed working with: Building new algorithms for on policy distillation with @DatHuynh13 Science of end to end thinking models @agarwl_ and many others Working prototype of…

291

35.0K

Rishabh Agarwal Retweeted

Khurram Javed@KhurramJaved_96 · Jun 13

Learning to play Atari from pixels from scratch in 30 minutes, all locally on an Apple Watch!

125

13.0K

Rishabh Agarwal@agarwl_ · Jun 13

Slides here for my CVPR talk: drive.google.com/file/d/1xd9gPM… @anoopcherian will probably know about the recording

328

343

20.0K

Rishabh Agarwal@agarwl_ · Jun 8

A few more observations after replicating the Tower of Hanoi game with their exact prompts: - You need AT LEAST 2^N - 1 moves and the output format requires 10 tokens per move + some constant stuff. - Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and…

JJosh Wolfe@wolfejosh · Jun 7

Apple just GaryMarcus'd LLM reasoning ability

256

2.0K

962

610.0K

Rishabh Agarwal Retweeted

Infini-AI-Lab@InfiniAILab · Jun 6

🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws 🤔How to effectively build a powerful reasoning agent? Existing compute-optimal scaling laws suggest 64K thinking tokens + 1.7B model > 32B model. But, It only shows half of the picture! 🚨 The O(N²)…

246

162

77.0K

Rishabh Agarwal@agarwl_ · Jun 7

Good take -- it's a good benchmark to develop better training algorithms / inference time scaling, which you can validate on other domains. Random / incorrect rewards won't work on this one Main gotcha is to not overfit to just ARC- like puzzles.

wwill brown@willccbb · Jun 7

people stopped working on ARC-AGI because they realized it was too hard

8.0K

Rishabh Agarwal@agarwl_ · Jun 6

Giving my first ever invited talk at @CVPR , during the multimodal reasoning workshop: The Bitter Lesson for RL: Verification as the Key to Reasoning LLMs This talk is inspired by the two classic essays from Rich Sutton:

agarwl_'s tweet image. Giving my first ever invited talk at @CVPR , during the multimodal reasoning workshop:

The Bitter Lesson for RL: Verification as the Key to Reasoning LLMs

This talk is inspired by the two classic essays from Rich Sutton:

211

134

20.0K

Rishabh Agarwal Retweeted

Krishna Mohan@KMohan2006 · May 29

a great video by @jbhuang0604 explaining kl divergence and its computation

631

558

99.0K

Rishabh Agarwal Retweeted

Ganqu Cui@charlesfornlp · May 29

So many works talking about entropy, but what is the **mechanism** of entropy in RL for LLMs? 🤔 Our work gives a principled understanding, as well as two tricks that get entropy **controlled** 🧵

128

129

14.0K