Jie Zhang (@JieZhang_ETH)

Pinned

J

Jie Zhang@JieZhang_ETH · Oct 1

Still using MIA to detect the pre-training data of LLMs? Membership Inference Attacks cannot prove that a model was trained on your data!

JieZhang_ETH's tweet image. Still using MIA to detect the pre-training data of LLMs?

Membership Inference Attacks cannot prove that a model was trained on your data!

3

15

93

39

23.0K

J

Jie Zhang@JieZhang_ETH · Jul 18

Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf. ⏰ 10:50h - 12:20h📍West ballroom C Come if you want to chat about LLM's math capabilities for real-world tasks.

JJie Zhang@JieZhang_ETH · May 20

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

0

1

10

2

451

J

Jie Zhang@JieZhang_ETH · Jul 15

We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

1

7

47

7

3.0K

Jie Zhang Retweeted

E

Edoardo Debenedetti@edoardo_debe · Jun 27

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

1

17

122

72

26.0K

Jie Zhang Retweeted

D

Daniel Paleka@dpaleka · Jun 5

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

5

14

86

46

16.0K

Jie Zhang Retweeted

X

Xin Cynthia Chen @ ICML2025@XinCynthiaChen · Jun 2

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

5

43

256

201

52.0K

J

Jie Zhang@JieZhang_ETH · May 13

It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉

JJavier Rando@javirandor · May 13

Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.

0

9

1

399

J

Jie Zhang@JieZhang_ETH · May 2

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

3

46

11

3.0K

J

Jie Zhang@JieZhang_ETH · Apr 27

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf

KKristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

0

7

45

8

3.0K

Jie Zhang Retweeted

K

Kristina Nikolic @ ICML@NKristina01_ · Apr 19

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

6

27

202

97

41.0K

Jie Zhang Retweeted

F

Florian Tramèr@florian_tramer · Mar 26

I’ll be mentoring MATS for the first time this summer, together with @dpaleka! Link below to apply

2

9

68

31

10.0K

Jie Zhang Retweeted

J

Javier Rando@javirandor · Mar 20

At SpyLab we not only do great research but also have great fun 🏔️

0

4

57

2

4.0K

Jie Zhang Retweeted

J

Javier Rando@javirandor · Feb 5

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

4

27

146

99

13.0K

J

Jie Zhang@JieZhang_ETH · Jan 6

We are excited that this work has been accepted by @satml_conf! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…

JJie Zhang@JieZhang_ETH · Oct 1

Still using MIA to detect the pre-training data of LLMs? Membership Inference Attacks cannot prove that a model was trained on your data!

1

6

28

10

4.0K

Jie Zhang Retweeted

F

Florian Tramèr@florian_tramer · Nov 25

We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement. But @JieZhang_ETH broke the current version: arxiv.org/abs/2411.14834 Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…

1

3

16

6

2.0K