Ruchit Rawal
@RawalRuchit
CS Grad Student @UMDCS | Past: MPI-SWS, IISc & NSIT | Working on multi-modal understanding, robustness, & synthetic data generation. Melancholically optimistic
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

Updates from our depth-recurrent model adventure: 📈 KV cache sharing across recurrences = higher accuracy quicker + less memory 🔧 Now with vLLM integration & finetuning examples! github.com/seal-rg/recurr…
A recurrent depth/Huginn-3.5B Update: I orginally wanted to post these more often, but I guess time is a river, and I just don't like posting all that much yet... The most interesting finding about the depth recurrent model has been this unassuming chart, actually:
(Structured) Model pruning is a nice tool when you really need to deploy a model that is a *bit* smaller, but don't want to deploy a bigger hammer like quantization. We recently published an improved *automated* model pruning method, surprisingly based on model merging:
Test cricket at its finest.
Ben Stokes offers a draw. - India denies and continues to bat.
🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n
Many VLMs claim to process hours of video. But can they follow the story?🤔 Today, we introduce TimeScope: The benchmark that separates true temporal understanding from marketing hype. Let's see how much VLMs really understand!⏳
🤖 Transformers can write poetry, code, and generate stunning art, but can they predict seemingly random numbers? We show that they learn to predict simple PRNGs (LCGs) by figuring out prime factorization on their own!🤯 Find Darshil tomorrow, 11am at #ICML2025 poster session!
🚨 ICML 2025 🚨 I'll be at @icmlconf Mon-Fri. DM if you'd like to chat!☕️ Also come check out our poster on cracking Pseudo-Random Number Generators with Transformers! 🕚 Tuesday @ 11am #⃣ E-1206 🔗arxiv.org/abs/2502.10390
Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵
🎉ArgusBench is accepted to ICCV 2025!! 🌊 It's a new benchmark for evaluating hallucinations & omissions in Video-LLM dense captions! - Unlike QA-based metrics, we focus on open-ended text generation. Why? Verification ≠ Generation! - How it works: We match generated…
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.
ARGUS 👁️ has set its sights on Hawaii! Catch us at @ICCVConference this fall 🏖️
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.
We broke @cluely's “Cheat on Everything” tool… using an audio prompt injection What happened next was pure gold Watch the full video 👇
Most papers discuss the hallucination problem in visual language models. In this paper, we present a framework to quantify both hallucination and omission problems in modern video LLMs. Both dataset and benchmarking code out!
Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.