Matei Zaharia
@matei_zaharia
CTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, http://DSPy.ai. http://linkedin.com/in/mateizaharia
Excited to launch Agent Bricks, a new way to build auto-optimized agents on your tasks. Agent Bricks uniquely takes a *declarative* approach to agent development: you tell us what you want, and we auto-generate evals and optimize the agent. databricks.com/blog/introduci…
This is a good opportunity to announce that I recently joined the research team at @databricks where I will be working alongside @jefrankle @rishabhs @matei_zaharia Erich Elsen, and many others on the hardest problems at the intersection of information retrieval and AI.
I'm at ICML 🇨🇦 and I'm hiring at @databricks. Visit our booth if you're interested. My scientific focus: It's 1972 in AI, there's an AI crisis, Dijkstra isn't here to save us, and maybe RL can. Why Databricks? The long road to AGI is being paved here and we have the real evals 🧵
The SkyRL roadmap is live! Our focus is on building the easiest-to-use high-performance RL framework for agents. We'd love your ideas, feedback, or code to guide the project: github.com/NovaSky-AI/Sky…
Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you…
Recent work has seemed somewhat magical: how can RL with *random* rewards make LLMs reason? We pull back the curtain on these claims and find out this unexpected behavior hinges on the inclusion of certain *heuristics* in the RL algorithm. Our blog post: tinyurl.com/heuristics-con…
We're finding that what's needed in RL for enterprise tasks is pretty different than in foundation model training on math, code, etc. Catch @jefrankle and our team at ICML to talk about these problems!
Properties of our problems: * Semi-verifiability. Can LLM judges productively augment RLVR? How clean must they be? * Intermediate rewards. Signals we can exploit to make harder tasks tractable. * Real traces. Tons of human traces for imitation learning or environment building.
I'm at ICML 🇨🇦 and I'm hiring at @databricks. Visit our booth if you're interested. My scientific focus: It's 1972 in AI, there's an AI crisis, Dijkstra isn't here to save us, and maybe RL can. Why Databricks? The long road to AGI is being paved here and we have the real evals 🧵
The #SIGIR2025 Best Paper just awarded to the WARP engine for fast late interaction! Congrats to Luca Scheerer🎉 WARP was his @ETH_en MS thesis, completed while visiting us at @StanfordNLP. Incidentally, it's the fifth Paper Award for a ColBERT paper since 2020!* Luca did an…
📢 If you’re at #SIGIR2025 this week, make sure to be at Luca Scheerer’s paper talk: “WARP: An Efficient Engine for Multi-Vector Retrieval” (Wednesday 11am) WARP makes PLAID, the famous ludicrously fast ColBERT engine, another 3x faster on CPUs. With the usual ColBERT quality!
Come find me at the poster session now at #SIGIR2025! Let’s chat about LLM-based relevance judgments! @SIGIRConf
Yes, this is a description of how the dspy.SIMBA optimizer works. > a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Awesome read on Lucene's implementation of ACORN-1🔥🔥 Filtered vector search is everywhere! Efficient, general-purpose (predicate-agnostic) indices that can support those use cases are super, super powerful!! Try it out & check out our original paper dl.acm.org/doi/10.1145/36…
Elasticsearch / Lucene adopts ACORN-1, which expands the exploration of nodes to ensure enough candidates that meet the filter By @benwtrent elastic.co/search-labs/bl…
Separation of storage and compute doesn't sacrifice database performance. You can have both elasticity AND performance: neon.com/blog/separatio…
🔎 SkyRL + Search-R1 Training a multi-turn search agent doesn’t have to be complicated. With SkyRL, reproducing the SearchR1 recipe at high training throughput is quick and easy! We wrote up a detailed guide to show you how: novasky-ai.notion.site/skyrl-searchr1 1/N 🧵
I'm very excited to share some new work arxiv.org/abs/2506.06488. This work started out in conversations with @thorn where we realized that shadow model MIAs couldn't be used to audit models for harmful content of children. See 🧵 for why, and our progress on solving this...
As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic benchmarks are broken! Example: WebArena marks "45+8 minutes" on a duration calculation task as correct (real answer: "63 minutes"). Other benchmarks…
I will be at #ICML next week! It'll be great to catch up with friends, old and new. Happy to chat about our work on Data + AI at @DbrxMosaicAI. We're growing our team and have openings for researchers and engineers in areas such as document intelligence, knowledge assistant, data…
We just published a Databricks App template that shows how to: - Deploy a LangGraph agent asa Databricks app with a chat UI - Automatically monitor MLflow 3.0 traces on Databricks (including syncing to delta tables, with Unity Catalog governance of traces) I've also embedded my…
If you are: - An early-stage startup - Have raised up to $5M in venture funding - And are using Postgres, Apply to our Startup Program and get up to $100k in credits: neon.com/startups
✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…
1/N 📢 Introducing UCCL (Ultra & Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀 Code: github.com/uccl-project/u… Blog: uccl-project.github.io/posts/about-uc… Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA
This is a pretty good article on how we are rethinking OLTP databases with Lakebase!
Are OLTP databases due for a radical rethink? @AlexWilliams reports from the @Databricks Data + AI Summit, where @rxin made the case for decoupling compute and storage in Postgres — treating data more like code. thenewstack.io/new-oltp-postg…