Braintrust
@braintrustdata
The end-to-end platform for building world-class AI apps.
Evals should be easy. Meet Loop, the AI agent for automatic prompt, dataset, and scorer optimization. @aiDotEngineer

We've learned critical lessons from helping teams ship reliable LLM-powered products. Organizations using Braintrust run over 3,000 evaluations daily, providing us with unique insights into what actually works. Read more on the blog: braintrust.dev/blog/five-less…
Ankur Goyal is the CEO and Founder of @braintrustdata. He joins the show with @seanfalconer to talk about Braintrust and the unique challenges of developing evaluations in a non-deterministic context. @ankrgyl softwareengineeringdaily.com/2025/06/10/the…
The best founders operate in a state of paranoia. @ankrgyl learned the high stakes of building software early in his career when he was closing a big deal at Goldman Sachs. A managing director told him that if the product crashed, employees could lose their jobs. Now, as…
people seem to really love Loop. every day, people ask, "how do I build an agent like this?" the answer is simple :) use @braintrustdata
This Wednesday, 7/16, join us for a live session on how to evaluate agents. We'll share how we built Loop: the AI agent for automatic prompt, dataset, and scorer optimization. Register here: lu.ma/7zz12cdf?utm_s…
We put @xai's Grok 4 to the ultimate test: Can it create a good 'pelican riding a bicycle' SVG? (h/t @simonw) Learn more about how you can benchmark new models for your specific use case w/ Braintrust: braintrust.dev/blog/grok-4
We're hiring!
Gentle reminder We're hiring across a lot of roles, including systems (brainstore), product/design eng, infra (help us make aws/azure/gcp/k8s deployments gr8), support, growth, sales, BD, SE you name it. Anyone who wants to chat about roles this weekend, I'm around
when you make a 1-line change to update a model and your @braintrustdata evals improve accuracy and reduce latency by 50%
🧑💻💡 @braintrustdata helps its customers build world-class AI applications, on #AWS. At #NYTechWeek, Ankur Goyal, Founder & CEO of Braintrust, explains how exciting new platform features help streamline #AI development for the startup’s customers. 👇
This blog post is hot off the press🗞️ New article on how we write tests (evals) for Neon's MCP Server. We're using @braintrustdata for their SDK and runtime as well. We would like to have *a lot* more tests, but that'll come with time. Find the link below ⬇️⬇️
View eval traces in a friendly, chat-like UI with the new thread layout.

LIVE NOW: What to do when a new model comes out - How do I know if it will be a good fit for my app? - How might it impact cost? -Will users see improvements in the outputs of my app? Join here: lu.ma/s2g0jn6g
AI is a fundamentally new workload that demands a fundamentally new infrastructure stack. Proud to be working on some of the core components — evals and observability — which exactly fit this pattern. Pre-AI, evals (CI/CD) and observability are separate things. You don't feed…
The Redpoint InfraRed 100 is now live! This list honors 100 infrastructure innovators who are transforming how businesses scale, secure, and succeed. Check out this year's honorees and dive deeper with the dynamic list and our complete InfraRed Report linked below.
This Thursday, right after AI Engineer World's Fair, we are (with @braintrustdata @usepylon) organizing a private event at one of SF's most exclusive venues. Space is very limited, apply if you haven't already lu.ma/zuh7onim