Lily Chen
@lilyychenn
MIT
I am very excited about David's @ddvd233 line of work in developing generalist multimodal clinical foundation models. CLIMB (which will be presented at ICML 2025) github.com/DDVD233/climb is a large-scale benchmark comprising 4.51 million patient samples totaling 19.01 terabytes…
Thanks @iScienceLuvr for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training…
There are many KV cache-reduction methods, but a fair comparison is challenging. We propose a new unified metric called “critical KV footprint”. We compare existing methods and propose a new one - PruLong, which “prunes” certain attn heads to only look at local tokens. 1/7
How good are LLMs at 🔭 scientific computing and visualization 🔭? AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results. SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
friends at #CHI2025, Karan @realkaranahuja, Yiyue @LuoYiyue, and I are teaching a course on **Multimodal AI for human sensing and interaction** come join us and learn about the latest advances in multimodal AI, generative AI, efficient software, and sensing hardware to…
Lots of interest in the recent o3 and o4 models, but while these more advanced multimodal AI systems start getting better at math, do they also become better intelligent tutors to help students learn math? 🚨Introducing Interactive Sketchpad, an intelligent AI tutor that…
Can LLMs learn to reason better by "cheating"?🤯 Excited to introduce #cheatsheet: a dynamic memory module enabling LLMs to learn + reuse insights from tackling previous problems 🎯Claude3.5 23% ➡️ 50% AIME 2024 🎯GPT4o 10% ➡️ 99% on Game of 24 Great job @suzgunmirac w/ awesome…
Vision models have been smaller than language models; what if we scale them up? Introducing Web-SSL: A family of billion-scale SSL vision models (up to 7B parameters) trained on billions of images without language supervision, using VQA to evaluate the learned representation.…
While today’s multimodal models excel at language-based social tasks, can they understand humans without words? ...not really😶 We introduce MimeQA, a video QA dataset to test AI's nonverbal social intelligence—using mime videos 🤐 Paper: arxiv.org/pdf/2502.16671 🧵1/8
Introducing *ARC‑AGI Without Pretraining* – ❌ No pretraining. ❌ No datasets. Just pure inference-time gradient descent on the target ARC-AGI puzzle itself, solving 20% of the evaluation set. 🧵 1/4
Thrilled that we won an 🥂Outstanding Paper Award at #EMNLP2024! Super validating for using computational methods to investigate discourse processing via QUDs. Super proud of my students @YatingWu96 @ritikarmangla, amazing team @AlexGDimakis @gregd_nlp
LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read. But which ones are more important to be answered?🤔 We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5
heading to #emnlp2024! would love to chat with those interested in joining our Multisensory Intelligence research group at MIT @medialab @MITEECS media.mit.edu/groups/multise… Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory…
Excited for #EMNLP2024! Check out work from my students and collaborators that will be presented: jessyli.com/emnlp2024
📣 Announcing the name and theme of my new research group at MIT @medialab @MITEECS: ***Multisensory Intelligence*** media.mit.edu/groups/multise… Our group studies the foundations of multisensory AI to create human-AI symbiosis across scales and sensory mediums. We are hiring at…
I'm excited to announce that our work, 𝐅𝐚𝐜𝐭𝐏𝐈𝐂𝐎, has been accepted to 𝗔𝗖𝗟 𝟮𝟬𝟮𝟰! 🎉🇹🇭 A huge thanks to all amazing collaborators 🚀🫶 #NLProc #ACL2024NLP
LLMs can write impressive-looking summaries of technical texts in plain language. But are they factual? This is critical in medicine, and the answer is tricky! Introducing ⚕️FactPICO, the first **expert** evaluation of this, with explanations Paper: arxiv.org/abs/2402.11456 🧵1/