Julia Neagu
@JuliaANeagu
Building @QuotientAI โจ formerly @GitHub @GitHubCopilot ๐ค reformed physicist ๐ฉโ๐ฌ ~ opinions are my own ~
If you're shipping LLMs to production and still finding out about critical from your users, this course is for you. Real-time evals, automated detection, and the tools we use at @QuotientAI to keep AI grounded. On July 30th @jxnlco and myself are laying it all out.
how do i catch hallucinations? come learn to implement monitoring systems that catch AI errors as they happen in live production environments with @JuliaANeagu and @QuotientAI if you register, you'll be sent the recording and study notes after they're done!โฆ
new model suite just dropped. ๐น๐ถ๐บ๐ฏ๐ถ๐ฐ-๐๐ผ๐ผ๐น-๐๐๐ฒ-๐ฌ.๐ฑ๐ โ ๐ด๐ด.๐ฒ% accuracy ๐น๐ถ๐บ๐ฏ๐ถ๐ฐ-๐๐ผ๐ผ๐น-๐๐๐ฒ-๐ฏ๐ โ ๐ต๐ฐ.๐ฒ% accuracy ๐น๐ถ๐บ๐ฏ๐ถ๐ฐ-๐๐ผ๐ผ๐น-๐๐๐ฒ-๐ณ๐ โ ๐ต๐ฒ.๐ฎ% accuracy outperforms gpt-4.1 (74.0%) and claude-sonnet-4 (71.1%) on tool use evaluation.
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models
this is exactly what's possible right now: tiny, fast agents riding shotgun with your main stack. hyper-specialized to double-check tasks you absolutely canโt get wrong. they catch and fix mistakes and keep your agents on track. lfg
This is the start of a neat direction for @QuotientAI. In offline evals and online sampling, you can use this to get easy insights into the health of your tool calling. I wonder if in the future something like this could even be used for quick tool corrections in the online app.
This is the start of a neat direction for @QuotientAI. In offline evals and online sampling, you can use this to get easy insights into the health of your tool calling. I wonder if in the future something like this could even be used for quick tool corrections in the online app.
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models
how you can catch hallucniations in production with @QuotientAI sign up for study notes and recordings afterwards even if you can't attend live maven.com/p/285276/how-yโฆ
But who's that sexy voice? ๐ ๐๐ I finally got around to using @elevenlabsio for our demos!
Just dropped: three new cookbooks for building AI research agents with @ExaAILabs, @LangChainAI, @OpenAI, and @AnthropicAI โ now with built-in monitoring from @QuotientAI. Track search relevance. Catch hallucinations. Debug real-world agents as they run.
systematically improving rag sessions for the rest of the summer 1. rethinking rag with @Sourcegraph 2. how to catch halluicnations with @QuotientAI 3. lesson from building verticalized agents 4. billion scale vector search w/ @turbopuffer links all below!
In light of all the attention that context engineering is getting, today I proudly introduce the second book that Albert Ziegler and I have written together: Context Engineering for LLM Applications.
how do i catch hallucinations? come learn to implement monitoring systems that catch AI errors as they happen in live production environments with @JuliaANeagu and @QuotientAI if you register, you'll be sent the recording and study notes after they're done!โฆ