Kristina Nikolic @ ICML (@NKristina01_)

Pinned

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

NKristina01_'s tweet image. Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma!
But did the model actually give a useful answer?

In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

6

27

202

97

41.0K

K

Kristina Nikolic @ ICML@NKristina01_ · Jul 24

Great and comprehensive tutorial on jailbreaking and the threats for agentic AI systems in both digital and physical worlds. By @HamedSHassani, @aminkarbasi and @AlexRobey23. Strongly recommend to check out the website: jailbreak-tutorial.github.io

AAlex Robey@AlexRobey23 · Jul 14

On Monday, I'll be presenting a tutorial on jailbreaking LLMs + the security of AI agents with @HamedSHassani and @aminkarbasi at ICML. I'll be in Vancouver all week -- send me a DM if you'd like to chat about jailbreaking, AI agents, robots, distillation, or anything else!

2

4

34

6

2.0K

K

Kristina Nikolic @ ICML@NKristina01_ · Jul 18

Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf. ⏰ 10:50h - 12:20h📍West ballroom C Come if you want to chat about LLM's math capabilities for real-world tasks.

JJie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

0

1

10

2

445

K

Kristina Nikolic @ ICML@NKristina01_ · Jul 15

We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

1

7

47

7

3.0K

Kristina Nikolic @ ICML Retweeted

E

Edoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

1

17

122

72

26.0K

Kristina Nikolic @ ICML Retweeted

D

Daniel Paleka@dpaleka · Jun 5

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

5

14

86

47

16.0K

Kristina Nikolic @ ICML Retweeted

J

Jie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

5

21

131

62

17.0K

K

Kristina Nikolic @ ICML@NKristina01_ · May 20

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

JJie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

1

5

28

7

3.0K

K

Kristina Nikolic @ ICML@NKristina01_ · May 20

IMO it's very important to measure LLM utility in tasks that we actually want them to perform well on, not just hard sandbox tasks. This is an excellent benchmark that does exactly that!

JJie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

1

2

9

1

442

K

Kristina Nikolic @ ICML@NKristina01_ · May 13

It was amazing having @javirandor as a labmate at SPY Lab — in such a short time I learned a lot from him. Excited to see future work from this incredible researcher and great person!

JJavier Rando@javirandor · May 13

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.

0

9

2

356

K

Kristina Nikolic @ ICML@NKristina01_ · May 12

AutoAdvExBench was accepted as a spotlight at ICML. We agree it is a great paper! 😋 I would love to see more evaluations of LLMs performing real-world tasks with security implications.

JJavier Rando@javirandor · Mar 6

Running out of good benchmarks? We introduce AutoAdvExBench, a real-world security research benchmark for AI agents. Unlike existing benchmarks that often use simplified objectives, AutoAdvExBench directly evaluates AI agents on the messy, real-world research tasks.

0

4

35

8

5.0K

K

Kristina Nikolic @ ICML@NKristina01_ · May 2

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

3

46

11

3.0K

Kristina Nikolic @ ICML Retweeted

L

Luca Beurer-Kellner@lbeurerkellner · Apr 29

🏆 Super proud to announce: AgentDojo, a research project we did with ETH, just won the first prize of the @ai_risks SafeBench competition. AgentDojo is a really cool agent security benchmark we built with @edoardo_debe and @JieZhang_ETH. Here is why you should check it out 👇

1

4

17

14

2.0K

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 29

So stoked for the recognition that AgentDojo got by winning a SafeBench first prize! A big thank you to @ai_risks and the prize judges. Creating this with @JieZhang_ETH @lbeurerkellner @marc_r_fischer @mbalunovic @florian_tramer was amazing! Check out the thread to learn more

LLuca Beurer-Kellner@lbeurerkellner · Apr 29

🏆 Super proud to announce: AgentDojo, a research project we did with ETH, just won the first prize of the @ai_risks SafeBench competition. AgentDojo is a really cool agent security benchmark we built with @edoardo_debe and @JieZhang_ETH. Here is why you should check it out 👇

0

4

33

7

1.0K

Kristina Nikolic @ ICML Retweeted

J

Javier Rando@javirandor · Apr 28

Now @NKristina01_ is presenting the “jailbreak tax”. It measures how useful jailbreak outputs are for different attacks.

0

3

14

1

767

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 27

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

7

45

8

3.0K

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 24

The ICLR Oral is at 11:15am tomorrow in Garnet 212-213, and the poster is up 3pm-5:30pm in Hall 3! x.com/dpaleka/status…

DDaniel Paleka@dpaleka · Jan 10

Recent LLM forecasters are getting better at predicting the future. But there's a challenge: How can we evaluate and compare AI forecasters without waiting years to see which predictions were right? (1/11)

1

7

48

8

4.0K

Kristina Nikolic @ ICML Retweeted

A

Akshit jindal@akshitjindal01 · Apr 24

📢 Exciting to see a strong focus on AI safety at @iclr_conf 2025! Here's a thread with some standout papers you shouldn't miss:

1

4

15

18

2.0K

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 24

If you still have some energy after the registration queue, come find me in hall 3, poster #510, to chat about adversarial SEO for LLMs (don't come too soon though, since I'm also still queuing!)

EEdoardo Debenedetti@edoardo_debe · Jun 27, 2024

1/📣We introduce the *prompt injector's dilemma*: as LLMs get deployed in search engines, we show that developers are incentivized to use new forms of search engine optimization to boost their content, and in doing so they might collectively wreak havoc on search engines.

1

5

21

4

1.0K

Kristina Nikolic @ ICML Retweeted

N

Nikola Jovanović@ni_jovanovic · Apr 24

I will be presenting 6 papers at @iclr_conf and its workshops this year 🎉 🇸🇬 Reach out if you want to chat about any of these 👇(1/9)

1

11

0

1.0K