Paul Liang
@pliang279
Assistant Professor MIT @medialab @MITEECS @nlp_mit || PhD from CMU @mldcmu @LTIatCMU || Foundations of multisensory AI to enhance the human experience.
This spring I am teaching a new class at MIT called **How to AI (Almost) Anything** Its name is a play on 2 seminal @medialab courses: how to make almost anything (on design & fabrication) and how to grow almost anything (on synthetic biology) We are now in the AI age, and…

This paper is impressive! It introduces a clever way of keeping memory use constant regardless of task length. Great use of RL for AI agents to efficiently use memory and reasoning. Here are my full notes:
I am very excited about David's @ddvd233 line of work in developing generalist multimodal clinical foundation models. CLIMB (which will be presented at ICML 2025) github.com/DDVD233/climb is a large-scale benchmark comprising 4.51 million patient samples totaling 19.01 terabytes…
Thanks @iScienceLuvr for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training…
Building AI reasoning models with extremely long context lengths - think days, weeks, even years of context - is the next big challenge in AI. that's why i'm extremely excited about the latest work from Ao Qu @ao_qu18465, incoming PhD student in our group, on MEM1: RL for Memory…
🚀 Excited to share my first tweet and to introduce our latest work: MEM1: RL for Memory Consolidation in Long-Horizon Agents. Long-horizon agents (e.g., deep research, web agents) typically store all observations, actions, and intermediate thoughts in context. However, much of…
🚀 Excited to share my first tweet and to introduce our latest work: MEM1: RL for Memory Consolidation in Long-Horizon Agents. Long-horizon agents (e.g., deep research, web agents) typically store all observations, actions, and intermediate thoughts in context. However, much of…
Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵
We will present the work TODAY at 4:30 PM at West Hall #421 with a huge poster! Come visit us!
Excited to share our latest benchmark: CLIMB, where we built a solid data foundation for multimodal clinical models. With 4.51M patient samples, totaling 19.01 TB of data across 13 domains, it's currently the largest public clinical benchmark! Paper: arxiv.org/abs/2503.07667 Code:…
🧵1/10 LLMs can answer in many languages. But do they think in them? Even when prompted in Swahili or Thai, models often switch to English for reasoning. This breaks interpretability and trust. So we ask: Can LLMs reason in the input language?
We’re thrilled to announce that Media Lab alumni Karrie Karahalios (@kkarahal) and Pat Pataranutaporn (@patpat_mit) are joining the Media Lab faculty on September 1! media.mit.edu/posts/mit-medi…
Are we fact-checking medical claims the right way? 🩺🤔 Probably not. In our study, even experts struggled to verify Reddit health claims using end-to-end systems. We show why—and argue fact-checking should be a dialogue, with patients in the loop arxiv.org/abs/2506.20876 🧵1/
What would a World Model look like if we start from a real embodied agent acting in the real world? It has to have: 1) A real, physically grounded and complex action space—not just abstract control signals. 2) Diverse, real-life scenarios and activities. Or in short: It has to…
check out our growing open-source contribution MultiNet v0.2 - a comprehensive open-source benchmark for training and evaluating multimodal vision-language-action models on agentic and embodied tasks. think multimodal robotics and AI agent platforms - but with all data…
Incredibly excited to announce the release of MultiNet v0.2 - a major update to our comprehensive open-source benchmark suite for evaluating Multimodal Models on Action tasks. Read on for several paper announcements, details on the evaluation harness and platform, and more!…
Incredibly excited to announce the release of MultiNet v0.2 - a major update to our comprehensive open-source benchmark suite for evaluating Multimodal Models on Action tasks. Read on for several paper announcements, details on the evaluation harness and platform, and more!…
Most problems have clear-cut instructions: solve for x, find the next number, choose the right answer. Puzzlehunts don’t. They demand creativity and lateral thinking. We introduce PuzzleWorld: a new benchmark of puzzlehunt problems challenging models to think creatively.
Led by Prof. @pliang279, the Multisensory Intelligence group at the MIT Media Lab studies the foundations of multisensory artificial intelligence to create human-AI symbiosis across scales and sensory mediums. The group’s members draw upon their multidisciplinary backgrounds to…
Despite much progress in AI, the ability for AI to 'smell' like humans remains elusive. Smell AIs 🤖👃can be used for allergen sensing (e.g., peanuts or gluten in food), hormone detection for health, safety & environmental monitoring, quality control in manufacturing, and more.…
Lots of interest in AI reasoning, but most use cases involve structured inputs (text) with automatic and objective verifiers (e.g. coding, math). @lmathur_'s latest work takes an ambitious step towards social reasoning in AI, a task where inputs are highly multimodal (verbal and…
Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and external knowledge. We introduce Social Genome to study and advance this form of reasoning in models! New paper w/ Marian Qian, @pliang279, & @lpmorency!
Future AI systems interacting with humans will need to perform social reasoning that is grounded in behavioral cues and external knowledge. We introduce Social Genome to study and advance this form of reasoning in models! New paper w/ Marian Qian, @pliang279, & @lpmorency!
Thanks @iScienceLuvr for posting about our recent work! We're excited to introduce QoQ-Med, a multimodal medical foundation model that jointly reasons across medical images, videos, time series (ECG), and clinical texts. Beyond the model itself, we developed a novel training…
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training "we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with…
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training "we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with…