arize-phoenix
@ArizePhoenix
Open-Source AI Observability and Evaluation
🌟 Observe 2025 kicked off with a packed keynote We just dropped a stack of new features across Phoenix Here’s what’s new 👇
Put out a tutorial yesterday on annotating your trace data to build strong evals and experiments—seemed to resonate, so thought I’d share a bit more: Before jumping into your eval pipeline, you have to look at your data. There’s no one way to do this, but annotations help you…
For Arize Observe today, we set out to bring together the people shaping the future of AI—and wow, did you show up From hallway debates to main stage moments, it was a day full of energy, insight, and some real talk about where this field is headed. Here are a few standout…
Design Iteration >> Design Overhaul @arizeai @ArizePhoenix #AI #LLM #LLMOps
📈 @ArizePhoenix now has project dashboards! In the latest release @arizeai Phoenix comes with a dedicated project dashboard with: 📈 Trace latency and errors 📈 Latency Quantiles 📈 Annotation Scores Timeseries 📈 Cost over Time by token type 📊 Top Models by Cost 📊 Token…
Just wrapped up a tutorial - I use a custom annotations tool to build an end-to-end evaluation & experimentation pipeline🚀 Inspired by an article from @eugeneyan, I explore how to leverage annotations to construct evals, design thoughtful experiments, and systematically improve…
Thanks for all the love on Prompt Learning! We're really excited about the potential of using English feedback in the prompt learning loop. We’ve been benchmarking our Prompt Optimizer against real-world data sets. First up: Big Bench Hard (BBH) – 50 randomly sampled tasks, 1…
Using the tool I helped build to build the tool I'm building. This is the way. #Oss
Tracing and telemetry traditionally has been an operational requirement, not a development one. But I've found that with AI applications this fundamentally changes. Take LLM-as-a-judge for example. You might pick an off-the-shelf eval library and trust that it works. But this…
The cool thing about #OSS is that you can build the tools you yourself wish you had. Just started building out TypeScript evals and it's feeling pretty intuitive and the tooling around LLMs with things like the Vercel AI SDK feel really good. The thing that I find sorta lacking…
🚀 In case you missed it, Amazon Bedrock support is in the Phoenix Playground! Teams testing @Anthropic, @MistralAI , and @Meta models on Bedrock often end up juggling scripts, provider consoles, and scattered dashboards just to compare results. Instead, in Phoenix: ✅ Run…
As systems grow beyond one-off prompts and into long-running workflows, context becomes a key engineering surface. Learn 6️⃣principles for optimizing context 📚, courtesy of @schavalii arize.com/docs/phoenix/l…
🔧 @ArizePhoenix mcp gets phoenix-support tool for @cursor_ai / @AnthropicAI Claude / @windsurf ! You now can click the add to cursor button on phoenix and get a continuously updating MCP server config directly integrated into your IDE. @arizeai/[email protected] also comes…
As you tweak prompts and models to improve performance ... do you know what it’s costing you? 💸 @ArizePhoenix Cost Tracking makes it clear where your LLM spend is going - so you can catch runaway costs before they get out of hand. 1. Track token usage across models and…
Solid community 📚 paper reading coming up -- authors of the Self-Adapting LLMs (SEAL) paper, @adamzweiger and @jyo_pari of @MIT, answering your questions live lu.ma/szk3zrbd
In case you missed some big news from Arize Observe 2025: Phoenix Cloud just leveled up with Spaces & Access Management ✨You can now create multiple, tailored Phoenix Spaces for your team and projects 🔑 Easily manage user permissions in each space 👥 Zero-hassle team…
Solid conference day. @arizeai observe, year 2 was a hit. Digging the ice sculpture of @ArizePhoenix and da homies