Florian Tramèr

@florian_tramer

Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning

Zürich

Joined October 2019

211Following

6KFollowers

Pinned

Florian Tramèr Retweeted

Martin Vechev@mvechev · Jun 24

Thrilled to share that Snyk (@snyksec), a leader in cybersecurity, has acquired our AI spin-off @InvariantLabsAI, a year after launch! 🚀 Co-founded with @florian_tramer and PhDs from my lab, Invariant built a SOTA safeguard platform for securing AI agents. Congrats to all!

1.0K

Florian Tramèr@florian_tramer · Jul 15

I found a paper with this ref: - the title is from: arxiv.org/abs/2305.00944 - the author list is from: arxiv.org/abs/2012.07805 - the link is arxiv.org/abs/2302.12173 - in the text ref [1] is for: arxiv.org/abs/2503.18813 How did this happen? Seems too weird for a LLM hallucination

florian_tramer's tweet image. I found a paper with this ref:
- the title is from: arxiv.org/abs/2305.00944
- the author list is from: arxiv.org/abs/2012.07805
- the link is arxiv.org/abs/2302.12173
- in the text ref [1] is for: arxiv.org/abs/2503.18813

How did this happen? Seems too weird for a LLM hallucination

7.0K

Florian Tramèr@florian_tramer · Jul 15

We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

3.0K

Florian Tramèr@florian_tramer · Jul 4

Very cool result. In hindsight, this shouldn't be too surprising to anyone who has ever taken a multiple choice exam. Eg if you have a trigonometry problem and the possible solutions are A: 1 B: 3.7 C: -5 D: pi/2 which would you pick (with no knowledge of the question)?

NNikhil Chandak@nikhilchandak29 · Jul 4

🚨 Ever wondered how much you can ace popular MCQ benchmarks without even looking at the questions? 🤯 Turns out, you can often get significant accuracy just from the choices alone. This is true even on recent benchmarks with 10 choices (like MMLU-Pro) and their vision…

4.0K

Florian Tramèr Retweeted

Yannic Kilcher 🇸🇨@ykilcher · Jun 28

📢Paper Discussion Live📢 Come tonight to chat with us about: Design Patterns for Securing LLM Agents against Prompt Injections Be there, fun awaits! 6pm UTC, discord.gg/y78WFTy4?event…

11.0K

Florian Tramèr Retweeted

Edoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

122

26.0K

Florian Tramèr Retweeted

Jasper Dekoninck@j_dekoninck · Jun 26

Thrilled to share a major step forward for AI for mathematical proof generation! We are releasing the Open Proof Corpus: the largest ever public collection of human-annotated LLM-generated math proofs, and a large-scale study over this dataset!

6.0K

Florian Tramèr Retweeted

Invariant Labs@InvariantLabsAI · Jun 24

We’re thrilled to officially join forces with @snyksec! Together, we’re changing the landscape of the agentic AI future. More to come!

1.0K

Florian Tramèr@florian_tramer · Jun 16

RT to help Simon raise awareness of prompt injection attacks in LLMs. Feels a bit like the wild west of early computing, with computer viruses (now = malicious prompts hiding in web data/tools), and not well developed defenses (antivirus, or a lot more developed kernel/user…

SSimon Willison@simonw · Jun 16

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

559

3.0K

2.0K

420.0K

Florian Tramèr@florian_tramer · Jun 13

Simon wrote some very nice thoughts on our recent paper on design patterns for prompt injections. I've been following his writing on prompt injections since the start and his blog remains the best place to get an overview of the problem. I routinely recommend it to new students.

SSimon Willison@simonw · Jun 13

Here are my extensive notes on the paper simonwillison.net/2025/Jun/13/pr…

2.0K

Florian Tramèr Retweeted

Simon Willison@simonw · Jun 13

Anyone building "agentic" systems on top of LLMs needs to take this principle into account every time they design or implement anything that uses tools

14.0K

Florian Tramèr Retweeted

Daniel Paleka@dpaleka · Jun 5

We don't claim LLM forecasting is impossible, but argue for more careful evaluation methods to confidently measure these capabilities. Details, examples, and more issues in the paper! (7/7) arxiv.org/abs/2506.00723

1.0K

Florian Tramèr@florian_tramer · Jun 5

Can LLMs predict the future? Who knows... We argue current evaluations of LLM forecasters suffer from too many pitfalls to reliably assess any performance claims.

DDaniel Paleka@dpaleka · Jun 5

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

2.0K

Florian Tramèr Retweeted

Luca Beurer-Kellner@lbeurerkellner · May 26

😈 BEWARE: Claude 4 + GitHub MCP will leak your private GitHub repositories, no questions asked. We discovered a new attack on agents using GitHub’s official MCP server, which can be exploited by attackers to access your private repositories. creds to @marco_milanta (1/n) 👇

494

2.0K

499.0K

Florian Tramèr Retweeted

Florian Tramèr@florian_tramer · May 20

LLMs might one day compete with expert hackers. But the capabilities are not quite there yet. Yet, even if today's LLMs are not *better* at bad stuff than humans, they can be a lot *cheaper* for some of it

1.0K