Jie Zhang
@JieZhang_ETH
2-year PhD student at @ETH, AI privacy&security
Still using MIA to detect the pre-training data of LLMs? Membership Inference Attacks cannot prove that a model was trained on your data!

Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf. ⏰ 10:50h - 12:20h📍West ballroom C Come if you want to chat about LLM's math capabilities for real-world tasks.
1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).
We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)
🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵
It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉
Career update! I will soon be joining the Safeguards team at @AnthropicAI to work on some of the problems I believe are among the most important for the years ahead.
The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
I’ll be mentoring MATS for the first time this summer, together with @dpaleka! Link below to apply
At SpyLab we not only do great research but also have great fun 🏔️
Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇
We are excited that this work has been accepted by @satml_conf! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…
Still using MIA to detect the pre-training data of LLMs? Membership Inference Attacks cannot prove that a model was trained on your data!
We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement. But @JieZhang_ETH broke the current version: arxiv.org/abs/2411.14834 Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…