Nimit Kalra @ ICML 2025

@qw3rtman

currently feynman technique-ing my way through life. research @haizelabs, prev @citadel

nyc

Joined October 2011

901Following

1KFollowers

Pinned

Nimit Kalra @ ICML 2025@qw3rtman · Jul 16

Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models…

qw3rtman's tweet image. Flying out to #ICML2025 tonight! Always down to chat about unverifiable domains, evals, red-teaming, safeguards, or just meet cool people. I’ll be a panelist at the Methods and Opportunities at Small Scale workshop, sharing our work on tiny generalist reward models…

3.0K

Pinned

Nimit Kalra @ ICML 2025@qw3rtman · Jun 26

Discussing "Mind the Gap" tonight at @haizelabs's NYC AI Reading Group with @leonardtang_ and @willccbb. Authors study self-improvement through the "Generation-Verification Gap" (model's verification ability over its own generations) and find that this capability log scales with…

NNimit Kalra @ ICML 2025@qw3rtman · Jun 7

Still noodling on this, but the generation-verification gap proposed by @yus167 @_hanlin_zhang_ @ShamKakade6 @udayaghai et al. in arxiv.org/abs/2412.02674 is a very nice framework that unifies a lot of thoughts around self-improvement/verification/bootstrapping reasoning

9.0K

Nimit Kalra @ ICML 2025 Retweeted

Wing Lian (caseus)@winglian · Jul 15

The current state of the ecosystem for post-training using GRPO w/ vllm + flash attention is frustratingly brittle. - The most recent vllm only supports PyTorch==2.7.0 - vllm requires xformers, but specifically only v0.0.30 is supported for torch 2.7.0. Any prior version of…

281

165

49.0K

Nimit Kalra @ ICML 2025@qw3rtman · Jul 10

can’t even escape the arxiv speak in the group chat

459

Nimit Kalra @ ICML 2025@qw3rtman · Jul 7

Vogent has a fantastic battle-tested inference stack, glad to see they opened it up + already have a finetuning product. From what I've seen, open-source voice models solve the 0 → 1 quite well but require a lot of post-hoc tuning to get right

VVogent@vogentai · Jul 7

Today we're launching Vogent Voicelab: an optimized API to run top open-source voice models, like Sesame's CSM-1B, Dia, Orpheus, and more.

1.0K

Nimit Kalra @ ICML 2025@qw3rtman · Jul 1

chart crime so bad you gotta transcribe the values by hand and plot it yourself

601

Nimit Kalra @ ICML 2025@qw3rtman · Jun 30

evals evals evals

BBrendan (can/do)@BrendanFoody · Jun 30

Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to…

2.0K

Nimit Kalra @ ICML 2025 Retweeted

Leonard Tang@leonardtang_ · Jun 27

New open-source alert! spoken: a unified abstraction over realtime speech-to-speech foundation models. Run any S2S model from OpenAI, Google, Amazon — one interface with one line of code.

11.0K

Nimit Kalra @ ICML 2025@qw3rtman · Jun 26

qwen RL has felt icky recently, but these authors get llama RL to match

ZZengzhi Wang@SinclairWang1 · Jun 26

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

9.0K