Nived Rajaraman
@Nived_Rajaraman
EECS PhD student at Berkeley. Former intern at Deepmind. Reinforcement learning. I organize the BLISS seminar https://bliss.eecs.berkeley.edu/Seminar/index.html
At es-fomo workshop, talk to @rish2k1 about scaling test-time compute as a function of user-facing latency (instead of FLOPS)
[Sat Jul 19] @Nived_Rajaraman & @rish2k1 present work on improving accuracy-latency tradeoffs for test-time scaling. @gh_aminian presents work showing that a smoothened version of best-of-n gives improves reward vs KL tradeoffs when a low-quality proxy reward is used.
The abstract submission deadline for FoPt has been extended to the 21st of May (11:59pm UTC). Submission website: openreview.net/group?id=learn…
Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025
A quick reminder that the deadline for abstract submissions to FoPt (May 19) is fast approaching! Submit your best works!
Announcing the first workshop on Foundations of Post-Training (FoPT) at COLT 2025! 📝 Soliciting abstracts/posters exploring theoretical & practical aspects of post-training and RL with language models! │ 🗓️ Deadline: May 19, 2025
@nived_rajaraman will give an oral talk at the VerifAI workshop on why RL or verification is needed to effectively scale test-time compute! Lots of interesting insights in this paper! At VerifAI workshop, 3:45pm, April 27 arxiv.org/abs/2502.12118 x.com/setlur_amrith/…
🚨 RL or distillation/SFT: what to use to train next reasoning model? Which 📈 perf faster as we scale test compute? We answer these in a principled way so you don't have to burn GPUs🔥. 🎯 Ans: RL w/ rewards or verification >> SFT/distillation 😱 arxiv.org/pdf/2502.12118 🧵⤵️
Our first Rising Stars session featured fantastic talks by @TianlongChen4 @CongyueD @Nived_Rajaraman @zyh2022 Grigorios Chrysos
Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new @Google_AI paper, we instead focus on scaling the search axis. By just randomly sampling 200x & self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!
🚨 RL or distillation/SFT: what to use to train next reasoning model? Which 📈 perf faster as we scale test compute? We answer these in a principled way so you don't have to burn GPUs🔥. 🎯 Ans: RL w/ rewards or verification >> SFT/distillation 😱 arxiv.org/pdf/2502.12118 🧵⤵️
Scaling Test-Time Compute Without Verification or RL is Suboptimal "In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed…
I'll be at #NeurIPS2023, and the academic job market this year! RT will be greatly appreciated! I work on statistics and information theory, with applications in robust statistics, offline RL, game theory, human-AI interactions and LLMs. I'm recently working on better…
This is sickening.I hope the survivor is ok. How is he still listed as teaching this year @stanford?
Scoop: Stanford biology professor Hunter Fraser was arrested and charged with domestic violence. Fraser allegedly threw a woman on the ground and later slammed a door into her, court records show. Fraser pleaded not guilty. @StanfordDaily stanforddaily.com/2022/10/04/sta…