David Chan
@_dmchan
Postdoc at @berkeley_ai studying contextual grounding in multimodal AI. These are the voyages of the... Crap. I don't have a name for my own ship...
Awesome work exploring the power of serial computing!
Some problems can’t be rushed—they can only be done step by step, no matter how many people or processors you throw at them. We’ve scaled AI by making everything bigger and more parallel: Our models are parallel. Our scaling is parallel. Our GPUs are parallel. But what if the…
Me (To Cursor): Refactor this code. Cursor: Sure! I've refactored your code! It's shorter and cleaner now! Me: Are you sure there are no feature regressions? Cursor: The code is missing essential functionality. Me: ....
📢 Call for Papers! Last chance to hang with the CV crowd in Hawaii 🌴 We're hosting the 4th MMFM Workshop at #ICCV2025 — submit your work on vision, language, audio & more by July 1 🗓️ Also check out the CVPR edition 👉 @MMFMWorkshop 🔗 sites.google.com/view/mmfm4thwo…
🚀 Call for Papers! 🚀 Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i 🌺 Submit work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo… #MMFM4 #ICCV2025 #AI #multimodal
🚨 Rough luck with your #ICCV2025 submission? We’re organizing the 4th Workshop on What’s Next in Multimodal Foundation Models at @ICCVConference in Honolulu 🌺🌴 Send us your work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo…
🤔 Do LLMs exhibit in-group↔out-group perceptions like us? ❓ Can they serve as faithful virtual subjects of human political partisans? Excited to share our paper on taking LLM virtual personas to the *next level* of depth! 🔗 arxiv.org/abs/2504.11673 🧵
Submit your paper to our Multimodal Foundation Models (MMFM) Workshop at ICCV in Honolulu, Hawaii
🚀 Call for Papers! 🚀 Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i 🌺 Submit work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo… #MMFM4 #ICCV2025 #AI #multimodal
🚀 Call for Papers! 🚀 Excited to help organize the 4th Workshop on What is Next in Multimodal Foundation Models? at ICCV in Honolulu, Hawai'i 🌺 Submit work on vision, language, audio & more! 🗓️ Deadline: July 1, 2025 🔗 sites.google.com/view/mmfm4thwo… #MMFM4 #ICCV2025 #AI #multimodal
Excited to introduce our new work! Can VLMs solve puzzles that are hard for humans?
🔍 Just dropped: “Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint” 👉 arxiv.org/abs/2505.23759 Puns + pictures + positioning = a nightmare for today’s AI. These models just don’t get it (yet).😵💫 Check out the 🧵 to see our findings (1/4) #AI #Multimodal #VLM
Ever wondered if the way we feed image patches to vision models is the best way? The standard row-by-row scan isn't always optimal! Modern long-sequence transformers can be surprisingly sensitive to patch order. We developed REOrder to find better, task-specific patch sequences.
This is pretty insane: intology.ai/blog/zochi-acl. But also, I wonder what they put for Section E.1 on the ACL checklist 👀
Our work has been accepted to #ACL2025 ! Check out our paper: arxiv.org/abs/2503.04722.
Can LLM flip a biased coin? No! Can LLM update their priors with In-Context Learning? Yes! Check out our work "Enough Coin Flips Can Make LLMs Act Bayesian"
We all learned DFS in undergrad — but did you know it can fix hallucinations in VLMs? 💡 Meet REVERSE-VLM: a self-correcting model using DFS-style backtracking + resampling 📉 12% fewer hallucinations (CHAIR-MSCOCO) 📈 28% more accurate (HaloQuest) 🔗 reverse-vlm.github.io
News: Search Arena is now LIVE! 🌐🔍 ✅ Test web-augmented LLM systems on real-time, real-world tasks — retrieval, writing, debugging & more. ✅ Perplexity, Gemini, OpenAI go head-to-head. ✅ Crowd-powered evals. Leaderboard 🏆 coming soon… ⚡Try it now at lmarena .ai!
🚨Large video-language models LLaVA-Video can do single-video tasks. But can they compare videos? Imagine you’re learning a sports skill like kicking: can an AI tell how your kick differs from an expert video? 🚀 Introducing "Video Action Differencing" (VidDiff), ICLR 2025 🧵