Freddie Vargus
@freddie_v4
cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸
Quick weekend project: how good are LLM's at "Who's That Pokémon?" answer: not great! I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below

I'm doing AI research - comparing it to how humans think. Think quick, simulate a coin flip in your head. What is the result, heads or tails?
the dawg abides
@code_star can i get a vibe check lord Data Dawg?
Legal Research with Tavily + Quotient AI Retrieving legal facts from trusted sources demands more than just relevance—it requires precision, citations, and verifiable accountability. Here’s how we built an automated legal research agent using: - Tavily for search and extraction…
didn't expect this to get as much attention as it did! we're working on more exciting things so if you're looking to understand problems with your agents or want to make them better, message me or @JuliaANeagu and let us know 🙂 also try Quotient
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models
how you can catch hallucniations in production with @QuotientAI sign up for study notes and recordings afterwards even if you can't attend live maven.com/p/285276/how-y…
Just dropped: three new cookbooks for building AI research agents with @ExaAILabs, @LangChainAI, @OpenAI, and @AnthropicAI — now with built-in monitoring from @QuotientAI. Track search relevance. Catch hallucinations. Debug real-world agents as they run.