Ruchit Rawal

@RawalRuchit

CS Grad Student @UMDCS | Past: MPI-SWS, IISc & NSIT | Working on multi-modal understanding, robustness, & synthetic data generation. Melancholically optimistic

Joined May 2019

3KFollowing

374Followers

Pinned

Ruchit Rawal@RawalRuchit · Jun 10

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

RawalRuchit's tweet image. Introducing ARGUS 👁️

A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

11.0K

Pinned

Ruchit Rawal@RawalRuchit · Jul 22

Updates from our depth-recurrent model adventure: 📈 KV cache sharing across recurrences = higher accuracy quicker + less memory 🔧 Now with vLLM integration & finetuning examples! github.com/seal-rg/recurr…

JJonas Geiping@jonasgeiping · Jul 22

A recurrent depth/Huginn-3.5B Update: I orginally wanted to post these more often, but I guess time is a river, and I just don't like posting all that much yet... The most interesting finding about the depth recurrent model has been this unassuming chart, actually:

734

Pinned

Ruchit Rawal Retweeted

Jonas Geiping@jonasgeiping · Jun 30

(Structured) Model pruning is a nice tool when you really need to deploy a model that is a *bit* smaller, but don't want to deploy a bigger hammer like quantization. We recently published an improved *automated* model pruning method, surprisingly based on model merging:

242

207

34.0K

Ruchit Rawal@RawalRuchit · 13 h

Test cricket at its finest.

MMufaddal Vohra@mufaddal_vohra · 13 h

Ben Stokes offers a draw. - India denies and continues to bat.

120

Ruchit Rawal Retweeted

Micah Goldblum@micahgoldblum · Jul 23

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

117

14.0K

Ruchit Rawal Retweeted

Andi Marafioti@andimarafioti · Jul 23

Many VLMs claim to process hours of video. But can they follow the story?🤔 Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳

227

123

34.0K

Ruchit Rawal@RawalRuchit · Jul 14

🤖 Transformers can write poetry, code, and generate stunning art, but can they predict seemingly random numbers? We show that they learn to predict simple PRNGs (LCGs) by figuring out prime factorization on their own!🤯 Find Darshil tomorrow, 11am at #ICML2025 poster session!

DDarshil Doshi @ICML2025@darshilhdoshi1 · Jul 13

🚨 ICML 2025 🚨 I'll be at @icmlconf Mon-Fri. DM if you'd like to chat!☕️ Also come check out our poster on cracking Pseudo-Random Number Generators with Transformers! 🕚 Tuesday @ 11am #⃣ E-1206 🔗arxiv.org/abs/2502.10390

2.0K

Ruchit Rawal Retweeted

Zikui Cai@ZikuiCai · Jun 10

Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵

14.0K

Ruchit Rawal@RawalRuchit · Jun 26

🎉ArgusBench is accepted to ICCV 2025!! 🌊 It's a new benchmark for evaluating hallucinations & omissions in Video-LLM dense captions! - Unlike QA-based metrics, we focus on open-ended text generation. Why? Verification ≠ Generation! - How it works: We match generated…

RRuchit Rawal@RawalRuchit · Jun 10

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

5.0K

Ruchit Rawal@RawalRuchit · Jun 26

ARGUS 👁️ has set its sights on Hawaii! Catch us at @ICCVConference this fall 🏖️

RRuchit Rawal@RawalRuchit · Jun 10

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

2.0K

Ruchit Rawal Retweeted

Aryaman Behera@aryamanTitan · Jun 23

We broke @cluely's “Cheat on Everything” tool… using an audio prompt injection What happened next was pure gold Watch the full video 👇

132

111

2.0K

1.0K

361.0K

Ruchit Rawal@RawalRuchit · Jun 10

Most papers discuss the hallucination problem in visual language models. In this paper, we present a framework to quantify both hallucination and omission problems in modern video LLMs. Both dataset and benchmarking code out!

RRuchit Rawal@RawalRuchit · Jun 10

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

2.0K