Pietro Lesci

@pietro_lesci

Final-year PhD student @cambridge_uni. Causality & language models | ex @bainandcompany @ecb @amazonscience. Passionate musician, professional debugger.

Cambridge

Joined July 2018

2KFollowing

702Followers

Pinned

Pietro Lesci@pietro_lesci · Aug 14

Super excited and grateful that our paper received the best paper award at #ACL2024 🎉 Huge thanks to my fantastic co-authors — @clara__meister, Thomas Hofmann, @vlachos_nlp, and @tpimentelms — the reviewers that recommended our paper, and the award committee #ACL2024NLP

PPietro Lesci@pietro_lesci · Jun 7, 2024

Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +@clara__meister, Thomas Hofmann, @vlachos_nlp, @tpimentelms

6.0K

Pietro Lesci Retweeted

Suchir Salhan@suchirsalhan · Jul 18

I will also be sharing more Tokenisation work from @cambridgenlp at TokShop– this time on Tokenisation Bias by @pietro_lesci and @vlachos_nlp, @clara__meister, Thomas Hofmann and @tpimentelms.

274

Pietro Lesci Retweeted

Suchir Salhan@suchirsalhan · Jul 18

I'm in Vancouver for TokShop @tokshop2025 at ICML @icmlconf to present joint work with my labmates, @tweetByZeb, @pietro_lesci and @julius_gulius, and Paula Buttery. Our work, ByteSpan, is an information-driven subword tokenisation method inspired by human word segmentation.

886

Pietro Lesci Retweeted

Tiago Pimentel@tpimentelms · Jul 14

Mechanistic interpretability often relies on *interventions* to study how DNNs work. Are these interventions enough to guarantee the features we find are not spurious? No!⚠️ In our new paper, we show many mech int methods implicitly rely on the linear representation hypothesis🧵

204

174

16.0K

Pietro Lesci@pietro_lesci · Jun 6

Looking forward to this year's edition! With great speakers: Ryan McDonald @yulanhe Vlad Niculae @anas_ant @raquel_dmg @annargrs @preslav_nakov @mohitban47 @eunsolc @MarieMarneffe !

AAthens NLP Summer School@AthensNLP · Jun 6

📢 10 Days Left to apply for the AthNLP - Athens Natural Language Processing Summer School! ✍ Get your applications in before June 15th! athnlp.github.io/2025/cfp.html

2.0K

Pietro Lesci@pietro_lesci · Jun 4

If you use LLMs, tokenisation bias probably affects you: * Text generation: tokenisation bias ⇒ length bias 🤯 * Psycholinguistics: tokenisation bias ⇒ systematically biased surprisal estimates 🫠 * Interpretability: tokenisation bias ⇒ biased logits 🤔

TTiago Pimentel@tpimentelms · Jun 4

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!

868

Pietro Lesci Retweeted

Tiago Pimentel@tpimentelms · Jun 4

129

16.0K

Pietro Lesci Retweeted

Caiqi Zhang@caiqizh · Jun 2

🔥 We teach LLMs to say how confident they are on-the-fly during long-form generation. 🤩No sampling. No slow post-hoc methods. Not limited to short-form QA! ‼️Just output confidence in a single decoding pass. ✅Better calibration! 🚀 20× faster runtime. arXiv:2505.23912 👇

3.0K

Pietro Lesci Retweeted

Tiago Pimentel@tpimentelms · May 29

If you're finishing your camera-ready for ACL (#acl2025nlp) or ICML (#icml2025 ) and want to cite co-first authors more fairly, I just made a simple fix to do this! Just add $^*$ to the authors' names in your bibtex, and the citations should change :) github.com/tpimentelms/ac…

199

41.0K

Pietro Lesci Retweeted

Xiaochen Zhu (Neo)@ZhuNeo13294 · May 27

Inception Lab and Gemini Diffusion are hot these days. Just published a blog post on Diffusion Language Models! 🚀 Exploring how diffusion (yes, the image model kind) can be used for text generation. Check it out👇 spacehunterinf.github.io/blog/2025/diff… #NLP #LLMs #DiffusionModels

562

Pietro Lesci Retweeted

Workshop on Large Language Model Memorization@l2m2_workshop · May 16

📢 @aclmeeting notifications have been sent out, making this the perfect time to finalize your commitment. Don't miss the opportunity to be part of the workshop! 🔗 Commit here: openreview.net/group?id=aclwe… 🗓️ Deadline: May 20, 2025 (AoE) #ACL2025 #NLProc

1.0K

Pietro Lesci Retweeted

Andreas Vlachos@vlachos_nlp · May 7

The call for papers for the 8th @FEVERworkshop at #ACL is out: fever.ai/workshop.html Deadline for is on May 19th! And if you have a paper already reviewed in ARR, you can commit it until June 9th!

3.0K