Stanford AI Lab
@StanfordAILab
The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. ⛵️🤖 Emmy-winning video: https://www.youtube.com/watch?v=Cn6nmWlu1EA
Fascinating new paper on AI companionship w/data donation from Character.ai by @Diyi_Yang and colleagues: arxiv.org/abs/2506.12605
@sanmikoyejo gives a nice talk contextualizing our paper contribution in the broader AI Measurement Sciences community in an @StanfordHAI seminar earlier this year: hai.stanford.edu/events/hai-sem… (starting at 30:35) 🧵8/9
Thanks @willccbb!! For those at ICML, I'm giving a talk on Cartridges at the ES-FoMo workshop on Saturday at 10:45 -- come through!! Excited to talk memory, test-time training, and continual learning!
cant stop thinking about this one insanely elegant, seems insanely powerful
Flying to ICML tomorrow. Excited to present these works with incredible collaborators! RL has come a long way to deliver sizable impacts on problems across distributed systems, planning, cybersecurity, math, and game playing.
🚨 Can your LLM really do math—or is it cramming the test set? 📢 Meet Putnam-AXIOM, a advanced mathematics contamination-resilient benchmark that finally hurts FMs. 1. openreview.net/forum?id=kqj2C… 2. icml.cc/virtual/2025/p… #ICML2025 East Exhibition Hall A-B, #E-2502 🧵1/14
We’re presenting Minions at ICML starting now until 1:30pm at E-2907 — come by and chat!!
How can we use small LLMs to shift more AI workloads onto our laptops and phones? In our paper and open-source code, we pair on-device LLMs (@ollama) with frontier LLMs in the cloud (@openai, @together), to solve token-intensive workloads on your 💻 at 17.5% of the cloud cost…
heading to @icmlconf #ICML2025 next week! come say hi & i'd love to learn about your work :) i'll present this paper (arxiv.org/abs/2503.17514) on the pitfalls of training set inclusion in LLMs, Thursday 11am here are my talk slides to flip through: ai.stanford.edu/~kzliu/files/m…
An LLM generates an article verbatim—did it “train on” the article? It’s complicated: under n-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, MIAs & data transparency🧵
Just uploaded our code of Multi agent-based scientific research idea generation (accepted to #SIGDIAL2025): github.com/g6000/MultiAge… This is an extended version of @stanfordnlp's implementation. Thanks to @ChengleiSi and @tatsu_hashimoto for providing the original code😀
Today we launch Asimov. Asimov is our code research agent that is best-in-class in codebase comprehension. It is built for teams, built for enterprises, and built to remember. We use it everyday to accelerate our velocity and streamline distributed ops. Link below to sign up…
Come chat with me at our ICML poster about interpretability as a communication problem, and the need to derive new words for referencing language model concepts! 4:30PM-7, East Exhibition Hall A-B #E-500 We Can’t Understand AI Using our Existing Vocabulary
Understanding and control are two sides of the problem of communicating differing concepts between humans and machines. New position paper: Robert Geirhos, @_beenkim, and I argue we must develop neologisms - new words - for human and machine concepts to understand and control AI
I am presenting our position paper: "Societal Impacts Research Requires Benchmarks for Creative Composition Tasks" at #ICML2025 today at 11 am in #E-500! Come by and say hi! This paper won the Best Societal Impacts Paper Award at the Bi-align workshop 🎉🥳
If you're at ICLR, come check out the @bi_align workshop tomorrow! I'll be giving an oral presentation at 14:50 on our recent position paper: "Societal Impacts Research Requires Benchmarks for Creative Composition Tasks." arxiv.org/abs/2504.06549
At #ICML2025 in Vancouver 🇨🇦 this week, presenting some work from my first year at Stanford! Come find me at posters or just around the conference! Thursday: KernelBench: Can LLMs Write Efficient GPU Kernels? 11AM East E-2010 Saturday: Kevin: Multi-Turn RL for Generating…
Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
Interested in LLM evaluation reliability & efficiency? Check our ICML’25 paper Reliable and Efficient Amortized Model-based Evaluation arxiv.org/abs/2503.13335 w/ @percyliang @uiuc_aisecure @sanmikoyejo @yuhengtu @VirtueAI_co @StanfordAILab @stai_research @StanfordCRFM 🧵1/9
Recipient of an ICML 2025 Outstanding Paper Award, CollabLLM improves how LLMs collaborate with users, including knowing when to ask questions and how to adapt tone and communication style to different situations. This approach helps move AI toward more user-centric and…
Looking forward to attending ICML! Here are some works on memory/long context, verification, kernel design, multi-model AI systems, and theoretical understanding of test-time scaling from my awesome students and collaborators!
The #SIGIR2025 Best Paper just awarded to the WARP engine for fast late interaction! Congrats to Luca Scheerer🎉 WARP was his @ETH_en MS thesis, completed while visiting us at @StanfordNLP. Incidentally, it's the fifth Paper Award for a ColBERT paper since 2020!* Luca did an…
📢 If you’re at #SIGIR2025 this week, make sure to be at Luca Scheerer’s paper talk: “WARP: An Efficient Engine for Multi-Vector Retrieval” (Wednesday 11am) WARP makes PLAID, the famous ludicrously fast ColBERT engine, another 3x faster on CPUs. With the usual ColBERT quality!
Prompt caching lowers inference costs but can leak private information from timing differences. Our audits found 7 API providers with potential leakage of user data. Caching can even leak architecture info—OpenAI's embedding model is likely a decoder-only Transformer! 🧵1/9
ICML ✈️ this week. open to chat and learn mech interp from you. @aryaman2020 and i have cool ideas about steering, just come to our AxBench poster. new steering blog: zen-wu.social/steer/index.ht… 中文: zen-wu.social/steer/cn_index…
i forgot the whole point of saying you're at a conference is to advertise your poster please come check out AxBench by @ZhengxuanZenWu* me* et al. on Tuesday, 15 July at 11 AM - 1:30 PM
🏆Thrilled that #CollabLLM won the #ICML2025 Outstanding Paper Award! We propose a new approach to optimize human-AI collaboration, which is critical for agents. Congratulations to my fantastic co-authors; great job @ShirleyYXWu and Michel Galley driving the project!👏
Come check out our Spotlight Poster at #ICML2025 tomorrow at 4:30 PM W-813! Anders will be presenting our new work: Algorithms with Calibrated ML Predictions. Paper Link: arxiv.org/abs/2502.02861