Rohit Saxena

@rohit_saxena

PhD Student @Edin_CDT_NLP @EdinburghNLP

Edinburgh

Joined July 2009

640Following

281Followers

Pinned

Rohit Saxena@rohit_saxena · Mar 19

📣This work will appear at the ICLR 2025 Workshop on Reasoning and Planning for LLMs.🇸🇬 I'm currently on the job market, looking for research scientist roles. Feel free to reach out if you're hiring or know of any opportunities!

RRohit Saxena@rohit_saxena · Feb 10

LLMs can tackle math olympiad probs but... can they read a clock 🤔? 🕰️📆 Our experiments reveal surprising failures in temporal reasoning—MLLMs struggle with analogue clock reading & date inference! Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs

4.0K

Pinned

Rohit Saxena@rohit_saxena · May 2

We couldn’t be there in person, but our poster will be at #NAACL2025! Feel free to ping @aryopg with any questions or follow-ups.

AAryo Pradipta Gema@aryopg · May 2

MMLU-Redux just touched down at #NAACL2025! 🎉 Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅 If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

946

Rohit Saxena Retweeted

Joshua Ong @ ACL2025@joshuaongg21 · Jul 23

'Theorem Prover as a Judge for Sythetic Data Generation' has been accepted to ACL (Main) 🚀. Do check us out at July 30th (Wednesday) 11:00- 12:30pm at Hall 4/5! A huge thank you to my amazing collaborators: Shay @GiwonHong413849 @WendaLi8 📝: aclanthology.org/2025.acl-long.…

4.0K

Rohit Saxena Retweeted

Aryo Pradipta Gema@aryopg · Jul 22

New Anthropic Research: “Inverse Scaling in Test-Time Compute” We found cases where longer reasoning leads to lower accuracy. Our findings suggest that naïve scaling of test-time compute may inadvertently reinforce problematic reasoning patterns. 🧵

157

1.0K

578

145.0K

Rohit Saxena Retweeted

Zheng Zhao@ACL2025@zhengzhao97 · Jul 21

Why do AI assistants feel so generic? Our new #ACL2025 paper, PersonaLens🔎, tackles this head-on. We built a new benchmark to test personalization in ways that matter. I'll be presenting our work at the poster session in Vienna next week! 🧵[1/4]

2.0K

Rohit Saxena Retweeted

Neel Rajani @ICML'25@NeelRajani_ · Jul 16

🚨New paper alert!🚨 "Scalpel vs. Hammer: GRPO Amplifies Existing Capabilities, SFT Replaces Them" @ActInterp ICML'25 @deepseek_ai popularised RLVR and distillation for 'reasoning training'! But how do they differ under the hood? Details in 🧵: (1/8)

4.0K

Rohit Saxena Retweeted

auss@Auss_Abbood · Jun 26

Pictures are taken from the paper, which was published at the ICLR 2025 Workshop on Reasoning and Planning for LLMs: openreview.net/forum?id=5gfC2… Nice work @rohit_saxena @aryopg and @PMinervini.

Rohit Saxena Retweeted

Yifu Qiu@ACL2025 🇦🇹@yifuqiu98 · Jun 10

🔁 What if you could bootstrap a world model (state1 × action → state2) using a much easier-to-train dynamics model (state1 × state2 → action) in a generalist VLM? 💡 We show how a dynamics model can generate synthetic trajectories & serve for inference-time verification 🧵👇

5.0K

Rohit Saxena Retweeted

Shashwat Goel@ShashwatGoel7 · May 29

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

126

880

536

316.0K

Rohit Saxena Retweeted

AK@_akhaliq · May 19

MMLongBench Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

13.0K

Rohit Saxena@rohit_saxena · May 21

Check out our MMLongBench - a long-context benchmark for vision and language models. 🚀📏 Work led by amazing @ZhaoweiWang4

ZZhaowei Wang@ZhaoweiWang4 · May 21

🚨 New paper! 🚨 Many recent LVLMs claim massive context windows, but can they handle long contexts on diverse downstream tasks? 🤔 💡In our new paper, we find that most models still fall short! We introduce MMLongBench, the first comprehensive benchmark for long-context VLMs:…

591

Rohit Saxena Retweeted

Emile van Krieken@EmilevanKrieken · May 21

We propose Neurosymbolic Diffusion Models! We find diffusion is especially compelling for neurosymbolic approaches, combining powerful multimodal understanding with symbolic reasoning 🚀 Read more 👇

105

579

456

47.0K

Rohit Saxena@rohit_saxena · May 16

🚀Check out VISTA - a large-scale benchmark for scientific video summarization! #ACL2025 By amazing @dongqi_me

DDongqi Liu@ACL2025@dongqi_me · May 16

🚨 Long Paper Accepted at @aclmeeting 2025 main conference! 🚨 🎥 Our work "What Is That Talk About? A Video-to-Text Summarization Dataset for Scientific Presentations" introduces VISTA, a large-scale benchmark for scientific video summarization. #ACL2025 #NLProc #LLMs 🧵(1/3)

647

Rohit Saxena Retweeted

Alex Gurung@AlexAag1234 · Apr 3

Preprint: Can we learn to reason for story generation (~100k tokens), without reward models? Yes! We introduce an RLVR-inspired reward paradigm VR-CLI that correlates with human judgements of quality on the 'novel' task of Next-Chapter Prediction. Paper: arxiv.org/abs/2503.22828

325

261

38.0K

Rohit Saxena Retweeted

Hritik Bansal@hbXNov · Apr 2

📢Scaling test-time compute via generative verification (GenRM) is an emerging paradigm and shown to be more efficient than self-consistency (SC) for reasoning. But, such claims are misleading☠️ Our compute-matched analysis shows that SC outperforms GenRM across most budgets! 🧵

274

251

45.0K

Rohit Saxena Retweeted

AK@_akhaliq · Mar 28

VideoMind A Chain-of-LoRA Agent for Long Video Reasoning

287

157

60.0K

Rohit Saxena Retweeted

Pasquale Minervini@PMinervini · Mar 13

Please share it within your circles!

9.0K