Workshop on Large Language Model Memorization

@l2m2_workshop

The First Workshop on Large Language Model Memorization.

World

Joined September 2024

12Following

107Followers

Pinned

Workshop on Large Language Model Memorization@l2m2_workshop · May 16

📢 @aclmeeting notifications have been sent out, making this the perfect time to finalize your commitment. Don't miss the opportunity to be part of the workshop! 🔗 Commit here: openreview.net/group?id=aclwe… 🗓️ Deadline: May 20, 2025 (AoE) #ACL2025 #NLProc

1.0K

Workshop on Large Language Model Memorization Retweeted

Ai2@allen_ai · Apr 9

For years it’s been an open question — how much is a language model learning and synthesizing information, and how much is it just memorizing and reciting? Introducing OLMoTrace, a new feature in the Ai2 Playground that begins to shed some light. 🔦

140

628

312

164.0K

Workshop on Large Language Model Memorization@l2m2_workshop · Apr 10

Do language models just copy text they've seen before, or do they have generalizable abilities? ⬇️This new tool from Ai2 will be very useful for such questions! And allow me to plug our paper on this topic: We find that LLMs are mostly not copying! direct.mit.edu/tacl/article/d… 1/2

AAi2@allen_ai · Apr 9

8.0K

Workshop on Large Language Model Memorization Retweeted

Jiacheng Liu@liujc1998 · Apr 8

As infini-gram surpasses 500 million API calls, today we're announcing two exciting updates: 1. Infini-gram is now open-source under Apache 2.0! 2. We indexed the training data of OLMo 2 models. Now you can search in the training data of these strong, fully-open LLMs. 🧵 (1/4)

6.0K

Workshop on Large Language Model Memorization@l2m2_workshop · Apr 2

Hi all, reminder that our direct submission deadline is April 15th! We are co-located at ACL'25 and you can submit archival or non-archival. You can also submit work published elsewhere (non-archival) Hope to see your submission! sites.google.com/view/memorizat…

2.0K

Workshop on Large Language Model Memorization Retweeted

Abhilasha Ravichander@lasha_nlp · Mar 21

Want to know what training data has been memorized by models like GPT-4? We propose information-guided probes, a method to uncover memorization evidence in *completely black-box* models, without requiring access to 🙅‍♀️ Model weights 🙅‍♀️ Training data 🙅‍♀️ Token probabilities 🧵1/5

208

131

28.0K

Workshop on Large Language Model Memorization Retweeted

Niloofar (✈️ ICML)@niloofar_mire · Mar 3

Adding or removing PII in LLM training can *unlock previously unextractable* info. Even if “John.Mccarthy” never reappears, enough Johns & Mccarthys during post-training can make it extractable later! New paper on PII memorization & n-gram overlaps: arxiv.org/abs/2502.15680

6.0K

Workshop on Large Language Model Memorization Retweeted

Ashwinee Panda@PandaAshwinee · Mar 13

we show for the first time ever how to privacy audit LLM training. we give new SOTA methods that show how much models can memorize. by using our methods, you can know beforehand whether your model is going to memorize its training data, and how much, and when, and why! (1/n 🧵)

128

14.0K

Workshop on Large Language Model Memorization@l2m2_workshop · Nov 9

🎉 Happy to announce that the L2M2 workshop has been accepted at @aclmeeting! #NLProc #ACL2025 More details will follow soon. Stay tuned and spread the word! 📣

14.0K