Melissa Pan

@melissapan

CS PhD @UCBerkeley Sky Lab 🐻 Systems & AI & Sustainability 🌍 Prev: @google, @ibm, @CarnegieMellon🐕‍🦺, @UofT🇨🇦

Berkeley

Joined October 2023

480Following

1KFollowers

Pinned

Melissa Pan@melissapan · Apr 25

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

melissapan's tweet image. 🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️
🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks!

Paper: arxiv.org/pdf/2503.13657
Code: github.com/multi-agent-sy…

🧵1/n

209

133

36.0K

Pinned

Melissa Pan@melissapan · Jun 23

very inspirational

AAndy Konwinski@andykonwinski · Jun 23

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.

1.0K

Melissa Pan@melissapan · Jul 12

Awesome read on Lucene's implementation of ACORN-1🔥🔥 Filtered vector search is everywhere! Efficient, general-purpose (predicate-agnostic) indices that can support those use cases are super, super powerful!! Try it out & check out our original paper dl.acm.org/doi/10.1145/36…

DDoug Turnbull@softwaredoug · Apr 14

Elasticsearch / Lucene adopts ACORN-1, which expands the exploration of nodes to ensure enough candidates that meet the filter By @benwtrent elastic.co/search-labs/bl…

5.0K

Melissa Pan@melissapan · Jun 26

We at @NovaSkyAI have been hacking on RL across the stack—algorithms, envs, perf optimization. But progress is slowed by RL frameworks with tightly-coupled components that lack interfaces. To fill this gap, we upgraded SkyRL into a highly-modular RL framework. Check it out!!

NNovaSky@NovaSkyAI · Jun 26

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…

5.0K

Melissa Pan@melissapan · Jun 14

multi-agent outperforms single agent by 90.2% is very interesting. One reason we haven't seen multi-agents winning is that existing benchmarks are rather "simple." This makes multi-agents seem more like a PoC than a necessity, which is not a true reflection of MAS's capability.

AAnthropic@AnthropicAI · Jun 13

New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parallel. We share what worked, what didn't, and the engineering challenges along the way. anthropic.com/engineering/bu…

1.0K

637

140.0K

Melissa Pan Retweeted

uccl_project@uccl_proj · Jun 12

1/N 📢 Introducing UCCL (Ultra & Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀 Code: github.com/uccl-project/u… Blog: uccl-project.github.io/posts/about-uc… Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA

7.0K

Melissa Pan Retweeted

Mir Miroyan@mirmiroyan · Jun 6

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵

233

181

42.0K

Melissa Pan@melissapan · May 22

Excited to share SkyRL-SQL, a simple yet effective multi-turn RL pipeline for training LLMs to generate and refine SQL through real database feedback. Rather than one-shot generation, models explore unfamiliar schemas, issue trial queries, reflect on results, and iteratively…

NNovaSky@NovaSkyAI · May 22

1/N Introducing SkyRL-SQL, a simple, data-efficient RL pipeline for Text-to-SQL that trains LLMs to interactively probe, refine, and verify SQL queries with a real database. 🚀 Early Result: trained on just ~600 samples, SkyRL-SQL-7B outperforms GPT-4o, o4-mini, and SFT model…

5.0K

Melissa Pan@melissapan · May 22

very inspiring research dessertation @lisabdunlap

2.0K

Melissa Pan Retweeted

Data Science Dojo@DataScienceDojo · Apr 28

Multi-agent LLM systems are exciting, but why do they so often fall short of their promise? A new paper from UC Berkeley, "Why Do Multi-Agent LLM Systems Fail?", offers one of the first systematic answers. The authors introduce MAST (Multi-Agent System Failure Taxonomy),…

2.0K

Melissa Pan Retweeted

NovaSky@NovaSkyAI · May 7

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50…

272

175

90.0K

Melissa Pan@melissapan · May 5

Real world AI pipelines are often compound, multi-module, and multi-step programs—unlike most RL/GRPO implementations today which optimize a single agent. 🚨 Super excited to release dspy.GRPO, which lets you GRPO tune any arbitrary multi-module, multi-step DSPy program, with…

OOmar Khattab@lateinteraction · May 5

So many things in the run-up to DSPy 3. Here's a first, EXPERIMENTAL one: 🚨We're releasing dspy.GRPO, an online RL optimizer for DSPy programs Your DSPy code as-is can be dspy.GRPO'ed. Yes, even compound multi-module programs. Led by @NoahZiems @LakshyAAAgrawal @dilarafsoylu.

7.0K

Melissa Pan Retweeted

CSGE@berkeley_csge · May 2

Berkeley CS Grad Entrepreneurs' Annual Mixer & After Party is happening TODAY at Databricks SF🌉 Excited to host PhDs, faculty, and alumni for an evening of research x startups, featuring panelists: John Schulman, Denis Yarats, Alex Dimakis and moderator Andy Konwinski (1/n)

3.0K

Melissa Pan@melissapan · May 2

Best tweet i read today🤣🤣🤣 MAST as a practical tool for hiring 😉

ssure, ai@sureailabs · Apr 11

Ask me what I do at work and I will send this paper. This is journal article is most of my job description. arXiv:2503.13657 (cs) [Submitted on 17 Mar 2025] Why Do Multi-Agent LLM Systems Fail? arxiv.org/abs/2503.13657

893

Melissa Pan@melissapan · Apr 29

Super cool paper on the failure modes of 'multi-agent' LM systems. But I'm curious and willing to change my mind on why people expect such systems to become useful. What's the hypothesis behind setting up shallow copies of the same LLM and *just* asking them to talk? For…

JJoey Gonzalez@profjoeyg · Apr 29

Multi-agent systems are supposed to provide a framework for decomposing problems and a mechanism to incorporate competing objectives. Yet, despite the significant progress in AI and reasoning, useful multi-agent systems remain the future (and not the present). Why don't…

105

26.0K

Melissa Pan@melissapan · Apr 29

Very productive conversations with @melissapan @IntuitMachine @sh_reya @tonychenxyz @cyrusnewday. My tl;dr -> There are at least 4 different concepts here, and it's essential to study them separately. 1) Structured programming to fully express your intent or control on the…

OOmar Khattab@lateinteraction · Apr 29

11.0K