Edoardo Debenedetti (@edoardo_debe)

Pinned

E

Edoardo Debenedetti@edoardo_debe · Mar 25

1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

edoardo_debe's tweet image. 1/🔒Worried about giving your agent advanced capabilities due to prompt injection risks and rogue actions? Worry no more! Here's CaMeL: a robust defense against prompt injection attacks in LLM agents that provides formal security guarantees without modifying the underlying model!

2

17

80

32

11.0K

E

Edoardo Debenedetti@edoardo_debe · Jul 22

Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!

edoardo_debe's tweet image. Excited to start as a Research Scientist Intern at Meta, in the GenAI Red Team, where I will keep working on AI agents security. I'll be based in the Bay Area, so reach out if you're around and wanna chat about AI security!

20

10

355

52

21.0K

E

Edoardo Debenedetti@edoardo_debe · Jun 28

This is huge for anyone building security systems for AI

EEdoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

2

20

9

4.0K

E

Edoardo Debenedetti@edoardo_debe · Jun 27

This is very exciting! The one thing I really missed from the CaMeL paper was example code implementing the pattern, now here it is

EEdoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

2

7

46

15

12.0K

E

Edoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

edoardo_debe's tweet card. Code for the paper "Defeating Prompt Injections by Design" - google-research/camel-prompt-injection

1

17

122

72

26.0K

Edoardo Debenedetti Retweeted

S

Simon Willison@simonw · Jun 13

"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks

9

181

1.0K

2.0K

97.0K

Edoardo Debenedetti Retweeted

I

Ilia Shumailov🦔@iliaishacked · Jun 2

Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.

4

35

174

113

18.0K

E

Edoardo Debenedetti@edoardo_debe · May 22

why was it `claude-3*-sonnet` , but then it suddenly became `claude-sonnet-4`

1

0

10

0

745

Edoardo Debenedetti Retweeted

F

Florian Tramèr@florian_tramer · May 20

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?

2

20

111

82

12.0K

E

Edoardo Debenedetti@edoardo_debe · May 13

Anthropic is really lucky to get @javirandor, we'll miss him at SPY Lab!

JJavier Rando@javirandor · May 13

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.

1

0

26

6

2.0K

E

Edoardo Debenedetti@edoardo_debe · May 12

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

JJavier Rando@javirandor · Mar 6

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

0

4

35

8

5.0K

E

Edoardo Debenedetti@edoardo_debe · May 2

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

3

46

11

3.0K

E

Edoardo Debenedetti@edoardo_debe · Apr 29

Thanks @ai_risks for the generous prize! AgentDojo is the reference for evaluating prompt injections in LLM agents, and is used for red-teaming at many frontier labs. I had a blast working on this with @edoardo_debe @JieZhang_ETH @marc_r_fischer @lbeurerkellner @mbalunovic

IInvariant Labs@InvariantLabsAI · Apr 29

We are proud to share that AgentDojo, an Invariant research project done with @ETH, has won the first price of the @ai_risks SafeBench competition. We truly appreciate this recognition from the community. Learn More: invariantlabs.ai/blog/agentdojo…

0

3

29

8

2.0K

E

Edoardo Debenedetti@edoardo_debe · Apr 29

So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to @ai_risks and the prize judges. Creating this with @JieZhang_ETH @lbeurerkellner @marc_r_fischer @mbalunovic @florian_tramer was amazing! Check out the thread to learn more

LLuca Beurer-Kellner@lbeurerkellner · Apr 29

🏆 Super proud to announce: AgentDojo, a research project we did with ETH, just won the first prize of the @ai_risks SafeBench competition. AgentDojo is a really cool agent security benchmark we built with @edoardo_debe and @JieZhang_ETH. Here is why you should check it out 👇

0

4

33

7

1.0K

E

Edoardo Debenedetti@edoardo_debe · Apr 27

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

7

45

8

3.0K

Edoardo Debenedetti Retweeted

J

Javier Rando@javirandor · Apr 25

Presenting 2 posters today at ICLR. Come check them out! 10am ➡️ #502: Scalable Extraction of Training Data from Aligned, Production Language Models 3pm ➡️ #324: Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI

0

3

24

2

3.0K

Edoardo Debenedetti Retweeted

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 23

I am in Singapore for ICLR this week. Reach out if you would like to chat about AI safety, agent security or ML in general.

2

41

4

3.0K

E

Edoardo Debenedetti@edoardo_debe · Apr 23

Shipment arrived on time. All non-sleepy members of SPY Lab are now in Singapore. Come meet us! @edoardo_debe @dpaleka @AerniMichael @JieZhang_ETH @NKristina01_

FFlorian Tramèr@florian_tramer · Apr 23

Academic bliss for one week: - semester holidays, so no teaching or meetings - shipped off all my students to ICLR Time for a nap

0

2

26

1

1.0K