Tyler Chang
@tylerachang
Research scientist @GoogleDeepMind. He/him/his.
We're organizing a shared task to develop a multilingual physical commonsense reasoning evaluation dataset! Details on how to submit are at: sigtyp.github.io/st2025-mrl.html
As part of the workshop, we are also organizing a shared task to develop a collaborative physical commonsense reasoning evaluation dataset. See the shared task page for more information: sigtyp.github.io/st2025-mrl.html.
Excited to announce the call for papers for the Multilingual Representation Learning workshop #EMNLP2025 sigtyp.github.io/ws2025-mrl.html with @_dataman_ @linguist_cat Jiayi Wang @fdschmidt @tylerachang @hila_gonen and amazing speakers: Alice Oh, Kelly Marchisio, & Pontus Stenetorp
The call for papers is out for the 5th edition of the Workshop on Multilingual Representation Learning which will take place in Suzhou, China co-located with EMNLP 2025! See details below!
Presenting our work on training data attribution for pretraining this morning: iclr.cc/virtual/2025/p… -- come stop by in Hall 2/3 #526 if you're here at ICLR!
We scaled training data attribution (TDA) methods ~1000x to find influential pretraining examples for thousands of queries in an 8B-parameter LLM over the entire 160B-token C4 corpus! medium.com/people-ai-rese…
One of the major pieces of feedback that we got on the last Turing test is that it was "too easy" because it used a 2-player format where you just speak to *either* a human or a model. We've revamped the site to make it more similar to Turing's original setup:
Turing test live uses a 3-party format, where you chat with a human and an AI simultaneously. Can you tell them apart? Live now and every day from 1–2 PM & 8–9 PM GMT at turingtest.live.
✨New pre-print!✨Successful language technologies should work for a wide variety of languages. But some languages have systematically worse performance than others. In this paper we ask whether performance differences are due to morphological typology. Spoiler: I don’t think so!
@tylerachang and my paper “When is Multilinguality a Curse?” was awarded outstanding paper! Thank you @emnlpmeeting ❤️
Announcing the 20 **Outstanding Papers** for #EMNLP2024
Our paper “When is Multilinguality a Curse?” will be presented at #EMNLP2024! We found that multilingual data hurts high-resource language performance, but improves low-resource performance as much as increasing training data by 33% @tylerachang arxiv.org/pdf/2311.09205
Super excited to finally release the Goldfish models, joint work with @tylerachang. These are small, comparable models for 350 languages. These are the first dedicated monolingual language models for many of these languages. huggingface.co/goldfish-models
New preprint with @tylerachang and Benjamin Bergen! We find that some languages need up to five times as much storage in bytes to convey the same amount of information arxiv.org/pdf/2403.00686…