sanjana

@sanjanayed

Berkeley EECS, Arize Phoenix

Berkeley, CA

Joined March 2025

66Following

82Followers

Pinned

sanjana@sanjanayed · Jul 22

Just wrapped up a tutorial - I use a custom annotations tool to build an end-to-end evaluation & experimentation pipeline🚀 Inspired by an article from @eugeneyan, I explore how to leverage annotations to construct evals, design thoughtful experiments, and systematically improve…

8.0K

sanjana@sanjanayed · 2 h

🔥🔥🔥🔥Phoenix update - pumped to start using this

MMikyo@mikeldking · 3 h

📈 @ArizePhoenix now has project dashboards! In the latest release @arizeai Phoenix comes with a dedicated project dashboard with: 📈 Trace latency and errors 📈 Latency Quantiles 📈 Annotation Scores Timeseries 📈 Cost over Time by token type 📊 Top Models by Cost 📊 Token…

sanjana Retweeted

Eugene Yan@eugeneyan · 8 h

what a world we live in! I just took a Jupyter notebook that implements an llm-evaluator, provided it as context to a coding assistant, and then ask it to write an evaluator class with specified inputs, outputs, etc. initially, it was verbose with too many class methods, but with…

6.0K

sanjana@sanjanayed · Jul 22

helped write this one! excited to have it out now

AArize AI@arizeai · Jul 21

Modern agents are increasingly complex — they’re multiple agents connected together through complex routing logic and handovers, multimodal, connecting to MCP servers as tools. Agent observability is no longer a nice-to-have. This can help. bit.ly/4f3gHWn

232

sanjana Retweeted

arize-phoenix@ArizePhoenix · Jul 19

Libraries: central.sonatype.com/search?q=arize Github: github.com/Arize-ai/openi…

404

sanjana Retweeted

arize-phoenix@ArizePhoenix · Jul 19

🚀 Introducing OpenInference Java! We're excited to announce the launch of OpenInference Java, a comprehensive solution for tracing AI applications using OpenTelemetry This is fully compatible with any OpenTelemetry compatible collector or backend! 📦 What’s included: ✅…

897

sanjana Retweeted

Aparna Dhinakaran@aparnadhinak · Jul 18

Prompts, like models, should improve with feedback — not stay static. Here’s how prompt learning works: 1️⃣ The prompt is treated as an online object — something that evolves over time 2️⃣ A LLM (or human) provides an assessment and an English natural language critique, unlike…

4.0K

sanjana@sanjanayed · Jul 18

Reinforcement Learning in English – Prompt Learning Beyond just Optimization @karpathy tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture. We believe that the industry chasing traditional RL is…

AAndrej Karpathy@karpathy · Jul 13

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

153

1.0K

155.0K

sanjana@sanjanayed · Jul 10

The secret to prompt optimization is evals Saw this tweet by Jason Liu and got me thinking about the future of prompt optimization Most of us are in Cursor/Claude Code and it makes a ton of sense to keep prompts close to code and iterate on them with AI code editors The hard…

jjason liu@jxnlco · Jul 5

holy shit lmfao claude code has been writing a prompt, looking at 200 failures and updated the prmopt, it went from v1 recall@1 60 -> 80 v2 recall@1 6 -> 52

4.0K

sanjana@sanjanayed · Jul 7

If you’re in the trenches with agents, this will save you time and sanity. Seeing everything click visually takes the guesswork out of spotting what's off

AArize AI@arizeai · Jul 7

🚀 Working with multi-agent systems? Arize Agent Visibility lets you actually see how your agents are structured automatically. Out of the box with frameworks like Agno, AutoGen, CrewAI, Mastra, SmolAgents & more. No extra setup. Here’s what it brings: ✅ Auto-generated…

102

sanjana@sanjanayed · Jul 3

Agents using phoenix-support in @cursor_ai aren’t just coding. They’re pulling in best practices, docs, and auto-updating tracing setups without humans in the loop. Feels like self-improving developer workflows!!

MMikyo@mikeldking · Jul 3

🔧 @ArizePhoenix mcp gets phoenix-support tool for @cursor_ai / @AnthropicAI Claude / @windsurf ! You now can click the add to cursor button on phoenix and get a continuously updating MCP server config directly integrated into your IDE. @arizeai/[email protected] also comes…

104

sanjana Retweeted

Arize AI@arizeai · Jun 30

Ever wonder if your agent’s actually getting it right over a whole convo, not just one step? New Session-Level Evals in Arize AX let you do exactly that by measuring: 🌀 Coherence across the session 🧩 Context retention across turns 🎯 Whether users actually reach their goals…

272