Zikui Cai

@ZikuiCai

Postdoc @UofMaryland @umiacs

United States

Joined August 2017

93Following

64Followers

Pinned

Zikui Cai@ZikuiCai · Jun 10

Introducing MORSE-500 🌐 morse-500.github.io 500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder. Key Features: 🚀 Fresh & Portable 🎯 Diverse Categories 👁️ Pure Visual Cues 📈 Scalable Difficulty Dive in 🧵

ZikuiCai's tweet image. Introducing MORSE-500
🌐 morse-500.github.io

500 scripted videos that stress-test six reasoning skills — beyond math, beyond static pics, built to get harder.

Key Features:
🚀 Fresh &amp; Portable
🎯 Diverse Categories
👁️ Pure Visual Cues
📈 Scalable Difficulty

Dive in 🧵

14.0K

Zikui Cai@ZikuiCai · Jul 23

I value a model that visually explains its thinking for easy, precise understanding! Excited to share our new Zebra-CoT dataset, enabling VLMs to reason with images + text. Explore: arxiv.org/abs/2507.16746 #AI #MachineLearning #CoT #VisualReasoning #MultimodalReasoning #VLMs

MMicah Goldblum@micahgoldblum · Jul 23

🚨Announcing Zebra-CoT, a large-scale dataset of high quality interleaved image-text reasoning traces 📜. Humans often draw visual aids like diagrams when solving problems, but existing VLMs reason mostly in pure text. 1/n

485

Zikui Cai@ZikuiCai · Jul 19

Thanks for sharing! Building and optimizing agentic systems with DSPy has been a great experience!

OOmar Khattab@lateinteraction · Jul 19

AegisLLM leverages DSPy's MIPROv2 optimizer in a totally unexpected way: to evolve its prompts based on the attacks it sees in real time. Some really large gains!

5.0K

Zikui Cai@ZikuiCai · Jul 19

Thanks for sharing our work!

AAhmad Beirami@abeirami · Jul 19

If you are interested in building agentic workflows, AegisLLM is a nice instantiation in safety/security domain! Thanks @furongh for sharing it with me. Agentic workflows must be designed and optimized as systems, as @lateinteraction keeps repeating.

138

Zikui Cai Retweeted

Furong Huang@furongh · Jul 14

🐭🔒 LLM security is a cat-and-mouse game. Attackers adapt. Prompts mutate. Meanwhile, most defenses? 🚫 Static. Fragile. One-shot fixes. It’s time for something smarter. ⚔️ Meet AegisLLM: An agentic runtime defense that thinks, reacts, and learns — just like the attackers do.…

17.0K

Zikui Cai@ZikuiCai · Jul 13

There’s been heated debate lately: Can generative AI truly self-improve? ✅Some say yes, pointing to models learning like curious humans. ❌Others say no, invoking the first law of thermodynamics: You can’t get something from nothing. No new info, no gain. 🧠 But what if the…

ZZikui Cai@ZikuiCai · Jun 10

12.0K

Zikui Cai Retweeted

Furong Huang@furongh · Jun 10

Excited to speak at the Workshop on Computer Vision in the Wild @CVPR 2025! 🎥🌍 🗓️ June 11 | 📍 Room 101 B, Music City Center, Nashville, TN 🎸 🧠 Talk: From Perception to Action: Building World Models for Generalist Agents Let’s connect if you're around! #CVPR2025 #robotics…

6.0K

Zikui Cai Retweeted

Ruchit Rawal@RawalRuchit · Jun 10

Introducing ARGUS 👁️ A benchmark for measuring hallucinations and omissions in free-form captions generated by Video-LLMs.

11.0K

Zikui Cai Retweeted

AK@_akhaliq · Jun 9

MORSE-500 A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning

9.0K

Zikui Cai Retweeted

Tanishq Abraham is at ICML@iScienceLuvr · Jun 9

MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning "We introduce MORSE-500 (Multimodal Reasoning Stress-test Environment), a video benchmark composed of 500 fully scripted clips with embedded questions spanning six complementary…

6.0K