Alex Ratner
@ajratner
@SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.
Agentic AI will transform every enterprise–but only if agents are trusted experts. The key: Evaluation & tuning on specialized, expert data. I’m excited to announce two new products to support this–@SnorkelAI Evaluate & Expert Data-as-a-Service–along w/ our $100M Series D! ---…
Not all benchmarks are created equal. We built a PhD-level multiple-choice test across 1,000+ subdomains, STEM, humanities, pro fields. Top LLMs? Scored <20%. This is what it takes to test advanced reasoning. Built with Snorkel’s Expert Data-as-a-Service. #LLM #GenAI
Thanks @lateinteraction ! Every time I think about the gazillion prompt / systems engineering tweaks that also go into making an AI system work I think about how early you were with @DSPyOSS :) Shared theme: find the key human input and make it programmatic.
Every time I think about what it takes to systematically organize the gazillion training tasks that together make a great foundation model, my appreciation for how early @SnorkelAI was increases.
America’s innovative edge makes us great—tell Congress: ProtectScienceAndInnovation.org Check out (and help!) push this nonpartisan campaign for investing in our most critical national edge! #ProtectScience #InnovationMakesAmericaGreat

Efficient data curation is critical for modern ML. 📣 We introduce Mimic Score, a new, lightweight, model-based metric for sample utility that leverages reference model's weights to identify high-value samples and accelerate training. 🎉 Accepted as an Oral at ICML’25 DataWorld!
Thanks @willccbb!! For those at ICML, I'm giving a talk on Cartridges at the ES-FoMo workshop on Saturday at 10:45 -- come through!! Excited to talk memory, test-time training, and continual learning!
cant stop thinking about this one insanely elegant, seems insanely powerful
Excited to share our new work: “Language Models Improve When Pretraining Data Matches Target Tasks” Yes, it sounds obvious (and it is!), but typically this only happens implicitly and indirectly: intuitively select data → benchmark → refine → repeat. We wondered: what…
LLM judges are powerful for automated evaluation but expensive and biased.📣 Meet PAJAMA, a new framework that distills LLM judging logic into a compact, executable form (a new representation), cutting costs from thousands to just cents.🚀 We'll present at ICML PRAL on Friday!
Excited to share our latest @SnorkelAI Leaderboard on Finance Reasoning – a realistic but challenging agentic benchmark involving tool use, reasoning, and document analysis over 10-Ks with a top score of just 51.9% accuracy. For LLM agents to thrive in the enterprise, they need…
Heading to #ICML! I’ll be representing SprocketLab at @UWMadison and @SnorkelAI. Reach out if you want to chat about data-centric AI, data development, agents, and foundation models.