PatronusAI

@PatronusAI

powerful ai evaluation and optimization 🦄 sign up: http://app.patronus.ai

Joined July 2023

252Following

2KFollowers

Pinned

PatronusAI@PatronusAI · May 14

1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,…

123

22.0K

PatronusAI@PatronusAI · Jul 21

At @PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like @n8n_io, @make_hq, @LangChainAI, @crewAIInc, and @huggingface smolagents. The article provides…

175

PatronusAI@PatronusAI · Jul 11

Meet @snigdhabanda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with…

PatronusAI's tweet image. Meet @snigdhabanda our FDE Lead! 🎉

Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with…

135

PatronusAI@PatronusAI · Jul 10

At @PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their…

183

PatronusAI@PatronusAI · Jun 30

Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into…

PatronusAI's tweet image. Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️

Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into…

211

PatronusAI Retweeted

MLCommons@MLCommons · Jun 27

Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 🔗 mlcommons.org/2025/06/ares-a…

750

PatronusAI@PatronusAI · Jun 26

We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as…

507

PatronusAI@PatronusAI · Jun 24

Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry × Academia Exchange! @rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),…

PatronusAI's tweet image. Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry × Academia Exchange!

@rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),…

3.0K

PatronusAI@PatronusAI · Jun 12

Welcome Peng Wang to the team! 🎉 Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,…

PatronusAI's tweet image. Welcome Peng Wang to the team! 🎉

Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,…

2.0K

PatronusAI@PatronusAI · Jun 5

Spotlighting our newest open-source agent benchmark: TRAIL 🥳 Grounded in multi-step evals and real-world agentic issues, we created a novel taxonomy containing 20+ agentic errors spanning reasoning, planning, and system execution errors. Following this taxonomy, we build on…

PatronusAI's tweet image. Spotlighting our newest open-source agent benchmark: TRAIL 🥳

Grounded in multi-step evals and real-world agentic issues, we created a novel taxonomy containing 20+ agentic errors spanning reasoning, planning, and system execution errors.

Following this taxonomy, we build on…

365

PatronusAI@PatronusAI · Jun 3

Today, we are proud to share our latest customer story: CARIAD of Volkswagen Group. @cariad_tech, @VW's software organization, leverages @PatronusAI's advanced evaluation tools to continuously enhance its in-vehicle AI assistants. When we first met the AI team at CARIAD, we…

323

PatronusAI@PatronusAI · May 28

Welcome Hersh Mehta to the team! 🎉 Hersh joins us as a Staff Research Engineer. Previously, he led research engineering efforts across companies like @Meta, Cruise, and @Uber. This included zero-to-one AI/ML research projects in self-driving, virtual reality, and other…

PatronusAI's tweet image. Welcome Hersh Mehta to the team! 🎉

Hersh joins us as a Staff Research Engineer. Previously, he led research engineering efforts across companies like @Meta, Cruise, and @Uber. This included zero-to-one AI/ML research projects in self-driving, virtual reality, and other…

219

PatronusAI Retweeted

Clémentine Fourrier 🍊@clefourrier · May 15

To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning... but to do so automatically, you need an LLM... 🤔so how do you evaluate the trace evaluator? With TRAIL, which contains: - a full taxonomy of agent errors and most frequent failure cases,

3.0K

PatronusAI Retweeted

Connor Shorten@CShorten30 · May 15

What if you had an Agent tailor-made for debugging and improving your Agent? I am SUPER EXCITED to publish our newest Weaviate Podcast featuring Anand Kannappan (@anandnk24), Co-Founder of Patronus AI (@PatronusAI)! 🎙️🔥 The Patronus AI team has just launched Percival!…

8.0K

PatronusAI Retweeted

Clémentine Fourrier 🍊@clefourrier · May 15

Check out the very cool work from our friends @PatronusAI 🔥 work here! huggingface.co/spaces/Patronu…

980