Tomasz Limisiewicz @ ICML
@TomLimi
Postdoctoral researcher at @meta Fair and @uwnlp , Interested in going into the inner workings of neural networks, multilingualism, and fairer NLP (he/him)
Excited to continue my research adventure as a postdoc at @uwnlp and @Meta ! I’ve joined @LukeZettlemoyer's fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text! [🦋posting]
![TomLimi's tweet image. Excited to continue my research adventure as a postdoc at @uwnlp and @Meta ! I’ve joined @LukeZettlemoyer's fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!
[🦋posting]](https://pbs.twimg.com/media/GnYFb2vaEAEQ4O8.jpg)
![TomLimi's tweet image. Excited to continue my research adventure as a postdoc at @uwnlp and @Meta ! I’ve joined @LukeZettlemoyer's fantastic lab. Together, we plan to rethink how LLMs perceive data to unlock their capabilities to uncharted language and, further, beyond text!
[🦋posting]](https://pbs.twimg.com/media/GnYFdEBbAAAjsCp.jpg)
Happening now in Meeting 112 -113 @icmlconf !
Three invited speakers will share their insights at TokShop! Hear from Yuval Pinter @yuvalpi, Desmond Elliott @delliott, and Adrian Łańcuck @AdrianLancuckii on cutting-edge tokenization research. Don't miss these keynote presentations! #ICML2025 tokenization-workshop.github.io/speakers
🎤 Meet our expert panelists! Join Albert Gu, Alisa Liu, Kris Cao, Sander Land, and Yuval Pinter as they discuss the Future of Tokenization on July 18 at 3:30 PM at TokShop at #ICML2025.
BLT model weights are out! Responding to popular demand, we just open-sourced model weights for our 1B and 8B BLT models for the research community to play with! huggingface.co/facebook/blt Hoping to see many new and improved BLT based architectures this year!
🏆 Announcing our Best Paper Awards! 🥇 Winner: "BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization" openreview.net/forum?id=AO78C… 🥈 Runner-up: "One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression" openreview.net/forum?id=lC4xk… Congrats! 🎉
most controversial statement so far from @alisawuffles: "tokenization research is not as cool" **very vocals disagreements from crowd of tokenization nerds**
🔥tokenization panel!
Panel on Future of Tokenization is happening now in Meeting 111-112. With: @alisawuffles @_albertgu @yuvalpi @magikarp_tokens @kroscoo Moderated by: @esalesky

Full house at the @tokshop2025 tokenization workshop at #ICML2025 today!
Check the Byte Latent Transformer poster at @tokshop2025. It’s just fortaste before the main presentation soon at @aclmeeting from @ArtidoroPagnoni!

I'm pleased to be in Vancouver for @icmlconf this week 🇨🇦🤖. I'll be happy to chat about multilingual, multimodal LMs and tokenization(free).

Got a good tokenization paper under review at COLM, but the scores were a letdown? 😬 Why bother with rebuttal when the perfect venue is right around the corner! Submit your paper to the #ICML2025 Tokenization Workshop (TokShop) by May 30! 🚀
📝 Submit papers (up to 9 pages, shorter submission ) via OpenReview: openreview.net/group?id=ICML.… 🗓️ Important dates: Deadline: May 30, 2025 Notifications: June 9, 2025 Workshop: July 18, 2025 Both archival and non-archival options available! #ICML2025 #TokShop #ML #NLProc
If you are at #ICLR25 and care about tokenizers, drop by our (@Aleph__Alpha)’s Birds of a Feather session – happening now at Opal 103.
It’s finally official: the long-awaited Tokenization Workshop is here! 🔡🤩
🚨 NEW WORKSHOP ALERT 🚨 We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf! 🎉 Submissions are open for work on tokenization across all areas of machine learning. 📅 Submission deadline: May 30, 2025 🔗 tokenization-workshop.github.io
So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussion.🔥 @ReviewAcl @aclmeeting [🦋posting]
![TomLimi's tweet image. So, apparently, confusing these two buttons can ignite a serious flame-war in reviewer-author discussion.🔥
@ReviewAcl @aclmeeting
[🦋posting]](https://pbs.twimg.com/media/GnoEMMeaMAMqg1f.jpg)
there’s a great talk comming up later today in Miami (@emnlpmeeting). FOMO.
Very excited to give a keynote talk at @mrl2024_emnlp tomorrow titled "Balanced and Efficient tokenization across languages"!
Very excited to give a keynote talk at @mrl2024_emnlp tomorrow titled "Balanced and Efficient tokenization across languages"!
The 4th Workshop on Multilingual Representation Learning @mrl2024_emnlp will happen tomorrow, 16th of November, 09:00 - 17:00 Website : sigtyp.github.io/ws2024-mrl.html Proceedings: aclanthology.org/events/emnlp-2… Looking forward to the keynotes by Karen Livescu, @seb_ruder and @hila_gonen
People want to attend ✅ Organizers want to organize ✅ @aclmeeting doesn't want to have it, sad 😦
What if we have a "Workshop for NLP Tokenizers" at @aclmeeting? 🤔 Invited speakers wishlist: - @taku910 (Sentencepiece) - @hauntsaninja (tiktoken) - @magikarp_tokens - someone from @huggingface tokenizers lib