Freddie Vargus

@freddie_v4

cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸

Boston

Joined June 2012

2KFollowing

1KFollowers

Pinned

Freddie Vargus@freddie_v4 · May 25

Quick weekend project: how good are LLM's at "Who's That Pokémon?" answer: not great! I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below

freddie_v4's tweet image. Quick weekend project: how good are LLM's at "Who's That Pokémon?"

answer: not great!

I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below

7.0K

Freddie Vargus Retweeted

John Berryman@JnBrymn · 4 h

I'm doing AI research - comparing it to how humans think. Think quick, simulate a coin flip in your head. What is the result, heads or tails?

361

Freddie Vargus@freddie_v4 · 7 h

the dawg abides

FFreddie Vargus@freddie_v4 · 8 h

@code_star can i get a vibe check lord Data Dawg?

844

Freddie Vargus Retweeted

Tavily@tavilyai · 10 h

Legal Research with Tavily + Quotient AI Retrieving legal facts from trusted sources demands more than just relevance—it requires precision, citations, and verifiable accountability. Here’s how we built an automated legal research agent using: - Tavily for search and extraction…

2.0K

Freddie Vargus@freddie_v4 · Jul 22

didn't expect this to get as much attention as it did! we're working on more exciting things so if you're looking to understand problems with your agents or want to make them better, message me or @JuliaANeagu and let us know 🙂 also try Quotient

FFreddie Vargus@freddie_v4 · Jul 22

today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools it's great at picking up on tool accuracy issues and outperforms larger models

1.0K

Freddie Vargus Retweeted

jason liu@jxnlco · Jul 21

how you can catch hallucniations in production with @QuotientAI sign up for study notes and recordings afterwards even if you can't attend live maven.com/p/285276/how-y…

1.0K

Freddie Vargus Retweeted

Julia Neagu@JuliaANeagu · Jul 18

Just dropped: three new cookbooks for building AI research agents with @ExaAILabs, @LangChainAI, @OpenAI, and @AnthropicAI — now with built-in monitoring from @QuotientAI. Track search relevance. Catch hallucinations. Debug real-world agents as they run.

3.0K