William Chen

@chenwanch1

PhD Student @LTIatCMU @SCSatCMU | Masters @LTIatCMU | Formerly @TXInstruments | @UCF ‘21

Joined June 2021

418Following

783Followers

Pinned

William Chen@chenwanch1 · May 9

What happens if you scale Whisper to billions of parameters? Our #ICML2025 paper develops scaling laws for ASR/ST models, training models with up to 18B params and 360K hours of data, and 100+ languages Joint work b/w @LTIatCMU and @nvidia arxiv.org/abs/2502.10373

chenwanch1's tweet image. What happens if you scale Whisper to billions of parameters?

Our #ICML2025 paper develops scaling laws for ASR/ST models, training models with up to 18B params and 360K hours of data, and 100+ languages

Joint work b/w @LTIatCMU and @nvidia

arxiv.org/abs/2502.10373

104

8.0K

William Chen@chenwanch1 · 2 h

Multilingual representation alignment through images (and without parallel data for new languages) : check out Nate's work #ACL2025NLP tomorrow. Paper: aclanthology.org/2025.acl-short… More details: 👇

NNate Krasner@KrasnerNat72053 · 3 h

Can multilingual text encoders borrow the semantic space from images to align their representations cross-lingually? I am presenting my paper “Cross-Lingual Representation Alignment Through Contrastive Image-Caption Tuning” at the 4pm poster session at ACL tomorrow! 🧵 (1/9)

326

William Chen@chenwanch1 · Jul 22

Meows, music, murmurs and more! We train a general purpose audio encoder and open source the code, checkpoints and evaluation toolkit.

aarXiv Sound@ArxivSound · Jul 21

Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," arxiv.org/abs/2507.14129

4.0K

William Chen Retweeted

Audio and Speech Processing Papers@AudioAndSpeech · Jul 21

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder. arxiv.org/abs/2507.14129

926

William Chen Retweeted

Josh Lee@cterdam · Jul 21

The week ahead: 20250721-20250727 [Tweet] OpenAI’s new model achieves gold medal-level performance in IMO. [Blog] Calvin’s thoughts as he departs OpenAI. [Tweet] Thinking Machines Lab, the AI startup led by Mira Murati, OpenAI’s ex-CTO... Read more: llz.info/weekly

237

William Chen@chenwanch1 · Jul 18

One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba being on the coffee sign. Felt surreal seeing someone realize their cultural impact.

chenwanch1's tweet image. One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba being on the coffee sign.

Felt surreal seeing someone realize their cultural impact.

6.0K

William Chen@chenwanch1 · Jul 16

I’ll be presenting this Thursday 4:30pm at the West hall, poster 418. Drop by to learn more about our latest experience in burning compute!

WWilliam Chen@chenwanch1 · May 9

982

William Chen Retweeted

Li-Wei Chen@liweiche77 · Jul 16

Presenting our #ICML2025 poster today! Discover our continuous, end-to-end approach that helps speech language models process speech prosody. Come learn more! 📍 W-411 (West Exhibition Hall B2 - B3) ⏰ 4:30 ~ 7:00 PM icml.cc/virtual/2025/p…

258

William Chen Retweeted

Li-Wei Chen@liweiche77 · Jul 15

Thrilled to share our #ICML2025 paper! We introduce a variational approach for speech language models, automating speech attribute learning to deliver more natural, human-like speech. Joint work b/w @LTIatCMU and @Apple Read it: arxiv.org/abs/2506.14767

463

William Chen@chenwanch1 · Jul 15

Not advertised yet, but we figured out how to do this too. And we release how exactly you can do it 👀. With the right training techniques, you can inject audio understanding and generation into an LLM with almost no loss in text perf. Details at arxiv.org/abs/2506.17611

VVaibhav (VB) Srivastav@reach_vb · Jul 15

the best part about the mistral release is that the models don't loose as much on text - this has been a biggest pain point for a audioLMs for a long while

2.0K

William Chen@chenwanch1 · Jul 5

how do yall think current day google translate works?? everyone's just stupid now i guess

ggrayv@catchthesezzzs · Jul 4

twitter changed the embedded translation feature to "translate with grok" so now out of sheer spite i am going to learn every single language ever. fuck ai

384

12.0K

527

473.0K

William Chen@chenwanch1 · Jul 4

What is it with speech reviewers on openreview? In my past 3 submissions (EMNLP 24, ICML 25, EMNLP 25), I have gotten only 1 reply to a rebuttal, out of a total of 11 reviews. Very frustrating, esp since they ask for more results and analyses that take a lot of time/compute.

2.0K

William Chen Retweeted

jiatongshi@jiatongshi · Jun 13

🔊 New release: #ARECHO -> Autoregressive Evaluation via Chain-based Hypothesis Optimization. • 87-metric coverage in one model 🧮 • Dynamic classifier chain 🤝 • Unified tokenization 🧩 • Confidence-aware decoding 🛡️ Built on #UniVERSA, heading to #VERSA. More ↓

1.0K

William Chen Retweeted

Masao@mmiagshatoy · May 31

🚀 Happy to share our #INTERSPEECH2025 paper: Using speaker & acoustic context, we dynamically　adjust model paths, resulting in a 25.7% relative BLEU improvement in speech translation. We also analyze how context influences model behavior. 📜 Paper: arxiv.org/abs/2505.18860

2.0K

William Chen@chenwanch1 · May 30

🚀 Introducing Uni-VERSA: a unified model for multi-dimensional speech evaluation-naturalness, intelligibility, noise, prosody & more. ⚡ 109× faster than native VERSA metric computation 🤗 Pretrained models + Colab demo 🧰 VERSA integration coming! 🔗 huggingface.co/collections/es…

AAudio and Speech Processing Papers@AudioAndSpeech · May 29

Uni-VERSA: Versatile Speech Assessment with a Unified Network. arxiv.org/abs/2505.20741

3.0K

William Chen@chenwanch1 · May 29

I’ll be interning at Adobe Research in San Francisco this summer, working on audio generation. HMU if you’re in the area and want to chat about speech / audio AI!

chenwanch1's tweet image. I’ll be interning at Adobe Research in San Francisco this summer, working on audio generation. HMU if you’re in the area and want to chat about speech / audio AI!

5.0K

William Chen@chenwanch1 · May 19

7/7 papers accepted to #Interspeech2025 🎉 Lots of interesting work from my fantastic co-authors on long-form processing, multilingualism, and multi-modal foundation models. See y’all in Rotterdam 🇳🇱

6.0K

William Chen Retweeted

Shuichiro Shimizu / 清水周一郎@cromz22 · May 16

Excited to share our survey paper accepted to #ACL2025NLP Findings: When Large Language Models Meet Speech: A Survey on Integration Approaches by Zhengdong Yang, Shuichiro Shimizu, Yahan Yu, Chenhui Chu (@knccch) 1/5

6.0K

William Chen Retweeted

Andrew Rouditchenko 🇺🇦@arouditchenko · May 15

Do you really need audio to fine-tune your Audio LLM? 🤔 Answer below: Introducing Omni-R1, a simple GRPO fine‑tuning method for Qwen2.5‑Omni on audio question answering. It sets new state‑of‑the‑art accuracies on the MMAU benchmark for Audio LLMs. arxiv.org/abs/2505.09439

145

113

8.0K