Matthieu Meeus (@matthieu_meeus)

Pinned

M

Matthieu Meeus@matthieu_meeus · Jun 11

(1/9) LLMs can regurgitate memorized training data when prompted adversarially. But what if you *only* have access to synthetic data generated by an LLM? In our @icmlconf paper, we audit how much information synthetic data leaks about its private training data 🐦🌬️

matthieu_meeus's tweet image. (1/9) LLMs can regurgitate memorized training data when prompted adversarially. But what if you *only* have access to synthetic data generated by an LLM?

In our @icmlconf paper, we audit how much information synthetic data leaks about its private training data 🐦🌬️

3

4

25

16

5.0K

M

Matthieu Meeus@matthieu_meeus · Jul 19

Presenting two papers at the MemFM workshop at ICML! Both touch upon how near duplicates (and beyond) in LLM training data contribute to memorization. - arxiv.org/pdf/2405.15523 - arxiv.org/pdf/2506.20481 @_igorshilov @yvesalexandre

matthieu_meeus's tweet image. Presenting two papers at the MemFM workshop at ICML!

Both touch upon how near duplicates (and beyond) in LLM training data contribute to memorization.

- arxiv.org/pdf/2405.15523
- arxiv.org/pdf/2506.20481

@_igorshilov @yvesalexandre

3

9

55

9

3.0K

M

Matthieu Meeus@matthieu_meeus · Jul 15

Lovely place for a conference! Come see my poster on privacy auditing of synthetic text data at 11am today! East Exhibition Hall A-B #E-2709

MMatthieu Meeus@matthieu_meeus · Jun 11

(1/9) LLMs can regurgitate memorized training data when prompted adversarially. But what if you *only* have access to synthetic data generated by an LLM? In our @icmlconf paper, we audit how much information synthetic data leaks about its private training data 🐦🌬️

0

1

17

1

932

M

Matthieu Meeus@matthieu_meeus · Jul 9

Will be at @icmlconf in Vancouver next week! 🇨🇦 Will be presenting our poster on privacy auditing of synthetic text. Presentation here icml.cc/virtual/2025/p… And will also be presenting two papers at the MemFM workshop! icml2025memfm.github.io Hit me up if you want to chat!

MMatthieu Meeus@matthieu_meeus · Jun 11

(1/9) LLMs can regurgitate memorized training data when prompted adversarially. But what if you *only* have access to synthetic data generated by an LLM? In our @icmlconf paper, we audit how much information synthetic data leaks about its private training data 🐦🌬️

1

4

34

2

2.0K

Matthieu Meeus Retweeted

I

Igor Shilov@_igorshilov · Jun 24

New paper accepted @ USENIX Security 2025! We show how to identify training samples most vulnerable to membership inference attacks - FOR FREE, using artifacts naturally available during training! No shadow models needed. Learn more: computationalprivacy.github.io/loss_traces/ Thread below 🧵

1

13

3

951

M

Matthieu Meeus@matthieu_meeus · Jun 10

How good can privacy attacks against LLM pretraining get if you assume a very strong attacker? Check it out in our preprint ⬇️

IIlia Shumailov🦔@iliaishacked · Jun 10

Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)

1

4

20

3

1.0K

Matthieu Meeus Retweeted

A

AK@_akhaliq · May 27

Google presents Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models

3

19

108

48

22.0K

M

Matthieu Meeus@matthieu_meeus · May 23

!!!

HHarvard University@Harvard · May 23

Without its international students, Harvard is not Harvard. hrvd.me/IntStudents25t

0

6

0

2.0K

Matthieu Meeus Retweeted

Y

Yves-A. de Montjoye@yvesalexandre · May 20

🚨One (more!) fully-funded PhD position in our group at Imperial College London – Privacy & Machine Learning 🔐🤖 starting Oct 2025 Plz RT 🔄

1

13

19

2

3.0K

Matthieu Meeus Retweeted

Y

Yves-A. de Montjoye@yvesalexandre · May 7

Yes yes I know the fundamental law of information recovery and differential privacy, but if there are really just a few summary statistics, surely it should be anonymous? 🥸 I definitely used to think this, until we started looking into it two years ago. A thread 🧵

1

9

2

493