Albert Gu
@_albertgu
assistant prof @mldcmu. chief scientist @cartesia_ai. leading the ssm revolution.
Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn fundamental to intelligence. Our new architecture, which enables hierarchical *dynamic chunking*, is not only tokenizer-free, but simply scales better.
Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
I'll be giving the first H-Net talk this afternoon at 4:30-5 PT at the ES-FoMo workshop! come support the fight against Big Token 🙏
Looking forward to seeing everyone for ES-FoMo part three tomorrow! We'll be in East Exhibition Hall A (the big one), and we've got an exciting schedule of invited talks, orals, and posters planned for you tomorrow. Let's meet some of our great speakers! 1/
One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba being on the coffee sign. Felt surreal seeing someone realize their cultural impact.
BPE transformer watching an H-Net output an entire wikipedia article as one chunk
Just saw the phrase "Big Token" to describe OAI/Anthropic/GDM/xAI/Meta and now I can't stop thinking about it.
I just saw @_albertgu call the major AI labs as "Big Token" and it has to be the most hilarious shit ever lol
I'm at ICML for the week!! come find the @cartesia_ai booth to chat about architectures, tokenizers, voice AI, etc @sukjun_hwang and @fluorane will also be around to talk about H-Nets 🙌
We’ll be talking about fine-grained differences between Transformers and SSMs, and how to distill them better. Lots of surprising findings in this paper!
@_albertgu and I are presenting today at 11 a.m. in East Exhibition Hall A-B (E-2712). If you’re interested in the capability gap between Transformers and SSMs—and how to close it—come by and chat!
impressive results on super long-form speech generation (> 10 minutes)! glad to see that the intuitions here closely track what I wrote about in my blog post about SSMs vs Transformers x.com/_albertgu/stat… 1. SSMs make more sense for long context where coherence matters more…
Excited to share Long-Form Speech Generation with Spoken LMs at #ICML2025 (Wed. oral)! We’ll present: - LibriSpeech-Long: new benchmark and evals for long-form generation quality - SpeechSSM: 1st *textless* spoken LMs for expressive *unbounded* speech Listen and learn more: 🧵
Big Token is quaking in their boots dont worry, we’re here to free you all
...wtf anthropic?