Sanmi Koyejo
@sanmikoyejo
I lead @stai_research at Stanford. Co-founder @VirtueAI_co
🚨 Keynote alert! We’re thrilled to welcome @sanmikoyejo as our next speaker in #DLI2025! Catch the session "Beyond benchmarks: Building a science of AI measurement" On Tuesday, August 19 at 9 am GMT+2. 🌐 Join us now via the Virtual indaba: deeplearningindaba.com/2025/virtual-i… #Urunana…
Huge congrats to @RuneKvist and team on launching AIUC-1. Excited to see this unfold!
To accelerate AI adoption, we need an AI standard. What Moody’s is for bonds, FICO for credit, SOC 2 for security. Standards offer credible signals of who to trust. They create confidence. Confidence accelerates adoption. Introducing AIUC-1: the world’s first AI agent standard
Proudly organized by - @BerivanISIK (Google) - @beyzaermis (Cohere) - @Diyi_Yang (Stanford) - @MariusHobbhahn (Apollo Research) - @attaluri_nithya (Google DeepMind) - @RishiBommasani (Stanford) - @YangjunR (U Toronto) 3/3
Featuring talks from experts in the field: - @esindurmusnlp (Anthropic) - @isabela_alb (Google DeepMind) - @_jasonwei (OpenAI) - @seo_minjoon (KAIST) - @natolambert (Allen Institute) - @orf_bnw (Google DeepMind) - @sanmikoyejo (Stanford) - @lxuechen (xAI) 2/3
We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3
Evaluating LLMs is one of the most critical and nuanced challenges in AI today. I am super excited to be co-organizing this workshop @NeurIPSConf to discuss the most pressing evaluation challenges. Details 👇
We are happy to announce our @NeurIPSConf workshop on LLM evaluations! Mastering LLM evaluation is no longer optional -- it's fundamental to building reliable models. We'll tackle the field's most pressing evaluation challenges. For details: sites.google.com/corp/view/llm-…. 1/3
Sparking AI Security Research Impact at ICML 2025: We’re thrilled to share that the Virtue AI team had 19 papers accepted at ICML 2025, one of the most prestigious conferences in AI and machine learning. Our work spanned critical advances in Security for AI agents and AI…
Sorry I meant this picture! :D Not the same one twice, X doesn't let me update my post.
Come to my second poster session about Data centric Machine Learning (DMLR)! At 209-2010! #ICML2025
Come to 208-209 ICML data workshop and chat with me about how to use data optimally! Scale isn't everything! Ask me how to use it beyond post-training ;) - Scale isn’t enough: LLM performance rises with training‑task alignment more than with data volume. - Robust Alignment…
Joint work with @ObbadElyas Mario Krrish Aryan @sanmikoyejo Me Sudarsan at @stai_research ! Thank you! 🧵3/3
We first demonstrated scale isn't enough in our Beyond Scale paper using the diversity coefficient! x.com/_akhaliq/statu… thanks for featuring us @_akhaliq ! Work led by @_alycialee et al! 🧵 2/3
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data paper page: huggingface.co/papers/2306.13… Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size.…
Come to 208-209 ICML data workshop and chat with me about how to use data optimally! Scale isn't everything! Ask me how to use it beyond post-training ;) - Scale isn’t enough: LLM performance rises with training‑task alignment more than with data volume. - Robust Alignment…
⚒️We also use Leni’s awesome Pantograph as an Python-Lean interface! #TACAS2025 🧵18/14 github.com/stanford-centa…
🙏We also want to thank our Verification expert @VeryCuellar John Sarrancino 🧵17/14 goto.ucsd.edu/~john/ galois.com/team/santiago-…
🎉We also want to thank the other agentic framework we feature and their help answering all my questions @lateinteraction @dilarafsoylu @kristahopsalong ! Guess…DSPy! 🧵16/14 dspy.ai
👏We want to Acknowledge Trace as one of the awesome agentic frameworks we used! Invented by @allenainie @adith387 @chinganc_rl 🧵15/14 microsoft.github.io/Trace/
🌐 Find reviews and paper on OpenReview, feel free to drop us constructive criticism if you want! openreview.net/forum?id=rWkGF… Full final code, data etc will be released in September for ICLR. Enjoy the research preview! If you want to get the data set early DM me. 🧵13/14
🎯 From “looks right” ➜ mathematically verified. Visit our poster #ICML2025 West Ballroom C Fri 18 Jul 10:50 a.m. PDT — 12:20 p.m. PDT. Thanks @sanmikoyejo @stai_research @zhankezhou @allenainie @kaifronsdal @westonkirk_ @ObbadElyas @YingLi1839269 @dilarafsoylu Leni Aniva…
💡 Amazing related work! But how we differ: CLEVER (proof puzzles), FVApps (breadth), DafnyBench (different prover). VeriBench = end-to-end code → tests → proofs plus security stakes. 🧵12/14