Florian Tramèr
@florian_tramer
Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning
Thrilled to share that Snyk (@snyksec), a leader in cybersecurity, has acquired our AI spin-off @InvariantLabsAI, a year after launch! 🚀 Co-founded with @florian_tramer and PhDs from my lab, Invariant built a SOTA safeguard platform for securing AI agents. Congrats to all!
I found a paper with this ref: - the title is from: arxiv.org/abs/2305.00944 - the author list is from: arxiv.org/abs/2012.07805 - the link is arxiv.org/abs/2302.12173 - in the text ref [1] is for: arxiv.org/abs/2503.18813 How did this happen? Seems too weird for a LLM hallucination
![florian_tramer's tweet image. I found a paper with this ref:
- the title is from: arxiv.org/abs/2305.00944
- the author list is from: arxiv.org/abs/2012.07805
- the link is arxiv.org/abs/2302.12173
- in the text ref [1] is for: arxiv.org/abs/2503.18813
How did this happen? Seems too weird for a LLM hallucination](https://pbs.twimg.com/media/Gv4pLB5XwAAZwp3.png)
We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
Very cool result. In hindsight, this shouldn't be too surprising to anyone who has ever taken a multiple choice exam. Eg if you have a trigonometry problem and the possible solutions are A: 1 B: 3.7 C: -5 D: pi/2 which would you pick (with no knowledge of the question)?
🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision…
📢Paper Discussion Live📢 Come tonight to chat with us about: Design Patterns for Securing LLM Agents against Prompt Injections Be there, fun awaits! 6pm UTC, discord.gg/y78WFTy4?event…
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…
Thrilled to share a major step forward for AI for mathematical proof generation! We are releasing the Open Proof Corpus: the largest ever public collection of human-annotated LLM-generated math proofs, and a large-scale study over this dataset!
We’re thrilled to officially join forces with @snyksec! Together, we’re changing the landscape of the agentic AI future. More to come!
RT to help Simon raise awareness of prompt injection attacks in LLMs. Feels a bit like the wild west of early computing, with computer viruses (now = malicious prompts hiding in web data/tools), and not well developed defenses (antivirus, or a lot more developed kernel/user…
If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!
Simon wrote some very nice thoughts on our recent paper on design patterns for prompt injections. I've been following his writing on prompt injections since the start and his blog remains the best place to get an overview of the problem. I routinely recommend it to new students.
Here are my extensive notes on the paper simonwillison.net/2025/Jun/13/pr…
Anyone building "agentic" systems on top of LLMs needs to take this principle into account every time they design or implement anything that uses tools
We don't claim LLM forecasting is impossible, but argue for more careful evaluation methods to confidently measure these capabilities. Details, examples, and more issues in the paper! (7/7) arxiv.org/abs/2506.00723
Can LLMs predict the future? Who knows... We argue current evaluations of LLM forecasters suffer from too many pitfalls to reliably assess any performance claims.
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)
😈 BEWARE: Claude 4 + GitHub MCP will leak your private GitHub repositories, no questions asked. We discovered a new attack on agents using GitHub’s official MCP server, which can be exploited by attackers to access your private repositories. creds to @marco_milanta (1/n) 👇
LLMs might one day compete with expert hackers. But the capabilities are not quite there yet. Yet, even if today's LLMs are not *better* at bad stuff than humans, they can be a lot *cheaper* for some of it