PatronusAI
@PatronusAI
powerful ai evaluation and optimization 🦄 sign up: http://app.patronus.ai
1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,…
At @PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like @n8n_io, @make_hq, @LangChainAI, @crewAIInc, and @huggingface smolagents. The article provides…
Meet @snigdhabanda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with…

At @PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their…
Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into…

Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 🔗 mlcommons.org/2025/06/ares-a…
We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as…
Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry × Academia Exchange! @rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),…


Welcome Peng Wang to the team! 🎉 Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,…

Spotlighting our newest open-source agent benchmark: TRAIL 🥳 Grounded in multi-step evals and real-world agentic issues, we created a novel taxonomy containing 20+ agentic errors spanning reasoning, planning, and system execution errors. Following this taxonomy, we build on…

Today, we are proud to share our latest customer story: CARIAD of Volkswagen Group. @cariad_tech, @VW's software organization, leverages @PatronusAI's advanced evaluation tools to continuously enhance its in-vehicle AI assistants. When we first met the AI team at CARIAD, we…
Welcome Hersh Mehta to the team! 🎉 Hersh joins us as a Staff Research Engineer. Previously, he led research engineering efforts across companies like @Meta, Cruise, and @Uber. This included zero-to-one AI/ML research projects in self-driving, virtual reality, and other…

To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning... but to do so automatically, you need an LLM... 🤔so how do you evaluate the trace evaluator? With TRAIL, which contains: - a full taxonomy of agent errors and most frequent failure cases,
What if you had an Agent tailor-made for debugging and improving your Agent? I am SUPER EXCITED to publish our newest Weaviate Podcast featuring Anand Kannappan (@anandnk24), Co-Founder of Patronus AI (@PatronusAI)! 🎙️🔥 The Patronus AI team has just launched Percival!…
Check out the very cool work from our friends @PatronusAI 🔥 work here! huggingface.co/spaces/Patronu…