Bhavya Chopra
@BhavyaChopra1
CS PhD Student @UCBerkeley • Research Intern @Tableau • previously Research Fellow @Microsoft @ProseMSFT; undergrad @IIITDelhi • HCI+Data research
Thrilled to share that I have started my PhD at @UCBerkeley with Prof. Aditya Parameswaran @adityagp! My research will focus on using human-centered approaches to assist data scientists, developers, and end-users with their data needs. Excited for the journey ahead!

I went down a rabbit hole to figure out how to map coordinates to cities and states without an API. I ended up making a library and putting it on NPM.
Want to build robust data processing agents 📄🤖? @sh_reya presents some of the latest ideas from her lab @UCBerkeley! - Many failure modes around data understanding, evals must be task specific - Intent gap between that the user asks and what they want. DocETL uses human…
🚨 📢 Releasing BARGAIN: Guaranteed Accurate AI for Less 💰 BARGAIN reduces LLM powered data processing costs by using cheaper LLMs (e.g,gpt-4o-mini) on data records they are accurate on 🎯 It *theoretically guarantees* the output matches expensive LLM's output (e.g,gpt-4o)
Real world AI pipelines are often compound, multi-module, and multi-step programs—unlike most RL/GRPO implementations today which optimize a single agent. 🚨 Super excited to release dspy.GRPO, which lets you GRPO tune any arbitrary multi-module, multi-step DSPy program, with…
So many things in the run-up to DSPy 3. Here's a first, EXPERIMENTAL one: 🚨We're releasing dspy.GRPO, an online RL optimizer for DSPy programs Your DSPy code as-is can be dspy.GRPO'ed. Yes, even compound multi-module programs. Led by @NoahZiems @LakshyAAAgrawal @dilarafsoylu.
So many things in the run-up to DSPy 3. Here's a first, EXPERIMENTAL one: 🚨We're releasing dspy.GRPO, an online RL optimizer for DSPy programs Your DSPy code as-is can be dspy.GRPO'ed. Yes, even compound multi-module programs. Led by @NoahZiems @LakshyAAAgrawal @dilarafsoylu.
Ask me what I do at work and I will send this paper. This is journal article is most of my job description. arXiv:2503.13657 (cs) [Submitted on 17 Mar 2025] Why Do Multi-Agent LLM Systems Fail? arxiv.org/abs/2503.13657
We've been working on extracting information in templatized PDFs for the last couple of years, leveraging the best of LLMs and classical data extraction techniques. Our latest technique, TWIX, has the best of all worlds: beats Azure DI, AWS Textract, or LLM-based approaches by…
NUDGE was at #ICLR2025 last week. It's a simple and lightweight fine-tuning method for pre-trained embeddings. It boosts retrieval accuracy by more than 10% in minutes! A no-brainer for your RAG pipelines; you should optimize pre-trained models to your data
Multi-agent systems are supposed to provide a framework for decomposing problems and a mechanism to incorporate competing objectives. Yet, despite the significant progress in AI and reasoning, useful multi-agent systems remain the future (and not the present). Why don't…
🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n
🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n
One of the main goals I had while building out multilspy (aka.ms/multilspy) was that eventually LLMs will be able to tool call LSPs. Happy to see steps in this direction: Checkout MultilspyMCP (playbooks.com/mcp/asimihsan-…), which provides an mcp implementation over multilspy!
Why Do Multi-Agent LLM Systems “still” Fail? A new study explores why Multi Agent Systems are not significantly outperforming single-agent. The study identifies 14 failure modes multi-agent system. Multi-agent system (MAS) are agents that interact, communicate, and collaborate to…
Oh no, why do they fail?! I was promised they would break the ground if I ran a swarm of them! Not fair, @berkeley_ai!
🔍Do you use platforms like #Kaggle, #HuggingFace, #Census, etc. to find datasets? We would love to hear your experiences as you discover relevant datasets in a brief user-study session! If interested, please respond to our survey: forms.gle/vfee9r2AEXNJ4r… RTs appreciated!😁
🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Introducing 📜DocWrangler: an open-source IDE for AI-powered data processing with built-in prompt engineering guidance and output inspection tools. Code: github.com/ucbepic/docetl Blog: data-people-group.github.io/blogs/2025/01/… Free research preview: docetl.org/playground Built @ Berkeley (1/7)
☀️Excited to share our new paper! Generative AI agents are powerful but complex—how do we design them for transparency and human control? 🤖✨ At the heart of this challenge is establishing common ground, a concept from human communication. Our new paper identifies 12 key…
I am attending EMNLP🌴in-person at Miami, to present MetaReflection. Please drop by at the Riverfront Hall at the 11AM Poster session on Nov 13th (today!) to know more about it or chat about Language agents and Neuro-symbolic AI at large!
Thanks to @mikeusachenko, multilspy (github.com/microsoft/mult…) now supports easily building language server clients for Javascript and Typescript! If you are looking for a framework to build your Agent Computer Interface on, consider using multilspy!
LLMs have made exciting progress on hard tasks! But they still struggle to analyze complex, unstructured documents (including today's Gemini 1.5 Pro 002). We (UC Berkeley) built 📜DocETL, an open-source, low-code system for LLM-powered data processing: data-people-group.github.io/blogs/2024/09/…