Cas (Stephen Casper) @ ICML
@StephenLCasper
AI technical gov & risk management research. PhD student @MIT_CSAIL. I'll be on the CS faculty job market this fall! https://stephencasper.com/
We outline 15 fully risk-agnostic, process-based, evidence-seeking policy objectives. None of them limit *what* developers can do -- they just affect *reporting* and *visibility.*

Even if you have input and output filters on a safety tuned LLM, it’s not necessarily safe ❌ I’m really excited about this paper helping to usher in the next meta of the LLM attack/defense teaming game.
1/ "Swiss cheese security", stacking layers of imperfect defenses, is a key part of AI companies' plans to safeguard models, and is used to secure Anthropic's Opus 4 model. Our new STACK attack breaks each layer in turn, highlighting this approach may be less secure than hoped.
👥👥 TAIG panel discussion! Many thanks to our wonderful panelists @MartaZiosi, @niloofar_mire, Jat Singh, ad @TobinSouth — and to @StephenLCasper for moderating. #ICML2025
🚨 Meet our panelists for the TAIG Workshop at this year’s #ICML2025! Join us on July 19 at 11:15 AM for a panel where we’ll explore key priorities and future directions for the field. @niloofar_mire @MartaZiosi Jat Singh, @TobinSouth moderated by @StephenLCasper
I just finished Empire of AI. Usually, the story of AI is told as a story about progress in capabilities, driven by R&D. But in this book, @_KarenHao tells it as a story about power, driven by people pursuing it. I think a couple missteps were made in the telling of the story…

Last week, the federal AI regulatory moratorium collapsed in a dramatic 99-1 Senate vote. Now the spotlight is back on the states. Three major bills could reshape AI governance, including a recently amended bill in California. For @CarnegieEndow, @alasdairpr and I explain how…
I just cited a paper from 1865 in a draft for an AI paper. Have any of you ever cited something older? en.wikipedia.org/wiki/The_Coal_…
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵
individual reporting for post-deployment evals — a little manifesto (& new preprints!) tldr: end users have unique insights about how deployed systems are failing; we should figure out how to translate their experiences into formal evaluations of those systems.
The Singapore Consensus is on arXiv now -- arxiv.org/abs/2506.20702 It offers: 1. An overview of consensus technical AI safety priorities 2. An example of widespread international collab & agreement
I'm honored to be part of arXiv:2506.20702, "The Singapore Consensus on Global AI Safety Research Priorities". Across companies and countries, there's more agreement than you'd think (paper URL in replies):
Making LLMs robust to tampering attacks might be one of the biggest priorities for safeguards research. @StephenLCasper argues this resistance may predict & upper-bound overall AI robustness, making it a key safety priority over the next year.
Let me know if you’d like to talk at @FAccTConference!

we also saw the same in OS-Harm for computer use agents arxiv.org/abs/2506.14866