Terra Blevins
@TerraBlvns
Postdoc @ViennaNLP and incoming asst professor @Northeastern @KhouryCollege || PhD @uwnlp || she/her
I’m very excited to join @Northeastern @KhouryCollege as an assistant professor starting Fall '25!! Looking forward to working with the amazing people there! Until then I'll be a postdoc at @ViennaNLP with Ben Roth, so reach out if you want to meet up while I'm over in Europe ✨
The Austrian Academy of Sciences is offering a pretty generous package to researchers in the US who would like to come to Austria for a postdoc. stipendien.oeaw.ac.at/en/fellowships…. Please email me if you're interested in applying for this by Jul 25 🧑🔬
🚨 Reminder: Paper submissions for the 1st Tokenization Workshop (TokShop) at #ICML2025 are due today May 30! 🔗CFP: tokenization-workshop.github.io
🚨 NEW WORKSHOP ALERT 🚨 We're thrilled to announce the first-ever Tokenization Workshop (TokShop) at #ICML2025 @icmlconf! 🎉 Submissions are open for work on tokenization across all areas of machine learning. 📅 Submission deadline: May 30, 2025 🔗 tokenization-workshop.github.io
LLMs bring a lot of prior knowledge to downstream tasks, but how well can they actually generalize OOD? Check out our new preprint to find out for biographical relation extraction!
Relation Extraction or Pattern Matching? How well do RE models generalise to OOD data? We find that higher in-distribution scores do not necessarily translate to better transferability. Pre-print: arxiv.org/abs/2505.12533
🚀 Introducing the Byte Latent Transformer (BLT) – An LLM architecture that scales better than Llama 3 using byte-patches instead of tokens 🤯 Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Patch… Code 🛠️ github.com/facebookresear…
Extremely excited to share that I will be joining @UBC_CS as an Assistant Professor this summer! I will be recruiting students this coming cycle!
Language identification is a huge barrier for creating high-quality datasets in low-resource languages. If you speak a language other than English, I encourage you to contribute to this project: dynabench.org/tasks/text-lan…
commoncrawl.org/blog/expanding…
@ViennaNLP group at #EMNLP in Miami! 🌴 #EMNLP2024 #nlproc
🚨How well can LLMs perform under lexical ambiguity? Excited to present our #EMNLP2024 paper, where we analyze large language models' performance and self-consistency when prompted with ambiguous entities. A joint work with the awesome @MaiNLPlab! (1/5)🧵
I'll be presenting this work today at #EMNLP2024 in Poster Session B (Riverfront Hall) at 2pm 🌴 Stop by to chat about building better and fairer multilingual models
Expert language models go multilingual! Introducing ✨X-ELM✨(Cross-lingual Expert Language Models), a multilingual generalization of the BTM paradigm to efficiently and fairly scale model capacity for many languages! Paper: arxiv.org/abs/2401.10440
Are you working on NLP for low-resource or non-Latin script languages? If yes, I have great news for you! Our MYTE tokenizer and MyT5 models 🪲 are now easily available through🤗. It’s easy to try:
💡We find that models “think” 💭 in English (or in general, their dominant language) when processing distinct non-English or even non-language data types 🤯 like texts in other languages, arithmetic expressions, code, visual inputs, & audio inputs ‼️ 🧵⬇️arxiv.org/abs/2411.04986
Reminder that I'm recruiting NLP-oriented Linguistics PhD students with a passion for low-resource languages this cycle! Applications are due December 1 (see link). I'll be at #EMNLP2024 next week if anyone wants to chat, so feel free to reach out! sas.rochester.edu/lin/graduate/a…
🎉 Happy to share that our paper on Cross-lingual Expert LMs has been accepted to #EMNLP2024!! Come say hi in Miami if you want to learn more 🌴
Expert language models go multilingual! Introducing ✨X-ELM✨(Cross-lingual Expert Language Models), a multilingual generalization of the BTM paradigm to efficiently and fairly scale model capacity for many languages! Paper: arxiv.org/abs/2401.10440
Happy to say that this work has been accepted to Findings of #EMNLP2024! Thanks to my fantastic co-authors for getting it across the finish line. I'll probably come to Miami to present it, so come say hi if you find the work interesting!
Preprint! We test methods to adapt a crosslingual model to a language family, and argue for targeted multilinguality as a middle ground for low-resource langs, avoiding the "curse of multilinguality" arxiv.org/abs/2405.12413 w/@TerraBlvns, @quirkyDhwani, @dwija_parikh, @ssshanest
📢 Calling all #NLProc enthusiasts! Submit your tutorial and workshop proposals to 2025 *ACL conferences (NAACL, ACL, EMNLP) through one joint call! Tutorials: 2025.naacl.org/calls/tutorial… Workshops:2025.naacl.org/calls/workshop…
The call for ACL workshops is out! Deadline: October 1st, 2024. Note: We have slightly changed what we ask for in the proposal, so please take a look at the requirements! Chairs: @TerraBlvns @chgravier @lexicutioner @kentonmurray & Saab Mansour aclweb.org/portal/content… #nlproc
Super excited to finally release the Goldfish models, joint work with @tylerachang. These are small, comparable models for 350 languages. These are the first dedicated monolingual language models for many of these languages. huggingface.co/goldfish-models
Do you like yellow? Then, according to LLMs, you are probably a school bus driver! Excited to share our new paper about Semantic Leakage in Language Models! Joint work with my wonderful collaborators @terra @alisawuffles @luke @nlpnoah Paper: gonenhila.github.io/files/Semantic… 1/10
Universal NER is gearing up for our next data release!! We're still looking for many common-spoken languages (Spanish, Hindi, and more!), so check out the blogpost and discord if you want to help build UNER v2 ⬇️
The Universal NER project had a great year 🎉, with a data release and a NAACL paper. Now we're gearing up for the next one, aiming to add 7 more languages by the end of the year. Want to help out? Discord here: discord.gg/2UyyzwEA Read more here: mayhewsw.github.io/2024/07/30/uni…