Tokenization Workshop (TokShop) @ICML2025
@tokshop2025
Let's Talk about Tokenization
Three invited speakers will share their insights at TokShop! Hear from Yuval Pinter @yuvalpi, Desmond Elliott @delliott, and Adrian Łańcuck @AdrianLancuckii on cutting-edge tokenization research. Don't miss these keynote presentations! #ICML2025 tokenization-workshop.github.io/speakers



🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.

🏆 Announcing our Best Paper Awards! 🥇 Winner: "BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization" openreview.net/forum?id=AO78C… 🥈 Runner-up: "One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression" openreview.net/forum?id=lC4xk… Congrats! 🎉

The TokShop schedule is now live! Join us at #ICML2025 for invited talks, poster sessions, and a panel on the future of tokenization. tokenization-workshop.github.io/schedule #Tokenization #LLM #NLProc

TokShop @ #ICML2025 got way more submissions than expected! 📈 We could really use a few more reviewers to help out. If you have the capacity to review a #tokenization paper by Saturday, please fill out this form: forms.gle/32A6sQHQrMSb6h… 🙏
Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬 Why bother with rebuttal when the perfect venue is right around the corner! Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
Beyond text: Modern AI tokenizes images, too! Vision models split photos into patches, treating each 16x16 pixel square as a "token." 🖼️➡️🔤 #VisualTokenization Interested in tokenization? Join our workshop tokenization-workshop.github.io The submission deadline is already May 30!
Got a tokenization paper rejected from ACL? Didn't submit to EMNLP/NeurIPS? Want to present your ACL/EMNLP/NeurIPS work non-archivally? Submit to TokShop @ ICML 2025! The deadline is already May 30! openreview.net/group?id=ICML.… tokenization-workshop.github.io
📣 Call for Paper Alert: TokShop @ ICML 2025 TokShop explores tokenization across all data modalities. Topics include: subword NLP techniques, multimodal approaches, multilingual challenges, post-training modification, alternative representations, and statistical perspectives.
Language matters: Low-resource languages are severely overtokenized: While English uses ~1.2 tokens per word, e.g., Tamil requires more tokens than characters, making #LLMs much costlier for billions of speakers! 💸🌍 Check out our ICML workshop 🔗 tokenization-workshop.github.io
Did you know BPE (Byte Pair Encoding), the most common LLM tokenizer, was originally a compression algorithm from 1994? #Tokenization #LLM #NLP Want to find out more about tokenization? Join our workshop at ICML! tokenization-workshop.github.io
Got a tokenization paper that just didn't make the cut for ICML? Submit it to the Tokenization Workshop TokShop at #ICML2025 -- we'd love to see it there! tokenization-workshop.github.io