Alexander H. Liu

@alex_h_liu

Ph.D. Student @MIT_CSAIL

Massachusetts, USA

Joined July 2013

145Following

269Followers

Alexander H. Liu Retweeted

Soham@sohamg121 · Jul 18

The Voxtral tech-report is up! arxiv.org/abs/2507.13264 We release these models with a permissive Apache 2.0 license. Feedback is welcome! We have a lot more cooking, this is just the beginning.

423

Alexander H. Liu Retweeted

Heng-Jui Chang@hjchang87 · Jun 25

💡Bridging speech, sound, & music representations with one universal model? We introduce USAD ✅ 📚 Distills knowledge from domain-specific SSL models 🎯 Matches expert models across speech/audio/music tasks 📄 arxiv.org/abs/2506.18843 🧑‍💻 huggingface.co/MIT-SLS/USAD-B…

2.0K

Alexander H. Liu@alex_h_liu · Dec 18

Highly recommended!!! (Happy to chat if you’re curious about the experience with the team)

RRafael Valle@RafaelValleArt · Dec 17

Our team at NVIDIA is continuously looking for highly motivated interns to work on intelligence in audio understanding and synthesis. Please reach out if you would like to collaborate with us!

2.0K

Alexander H. Liu@alex_h_liu · Dec 10

Turns out speech self-supervised learning technique can be generalized to sign language! Great work led by @Shester_G (he’s looking for PhD opportunity this year!)

SShester Gueuwou@Shester_G · Dec 4

Ever imagined a foundational model for sign language ?! Introducing SHuBERT(Sign Hidden Unit BERT)! With SHuBERT, we get SOTA results on ASL video understanding tasks compared to task-specific models from Google DeepMind, Meta, and Microsft, while using less compute ! 🧵 1/9

2.0K

Alexander H. Liu Retweeted

Rafael Valle@RafaelValleArt · Nov 27

💚 Big shoutout to the #FUGATTO team for making this release happen — and to cats like Coltrane and Xenakis, who envisioned a world where "saxophones bark and howl." Together, artists and researchers, let’s build a GPT-like future for audio generation! fugatto.github.io

3.0K

Alexander H. Liu Retweeted

Alan Baade@BaadeAlan · Oct 9

Q: Why can't we get GPT-level understanding from language models on speech? A: We need better speech tokens! In SyllableLM, *we beat @kyutai_labs Moshi on semantic understanding in 70 hours of training* by making speech tokens at 5 frames/s With @PuyuanPeng, David Harwath 1/n

4.0K

Alexander H. Liu Retweeted

Rafael Valle@RafaelValleArt · Jul 2, 2024

Synthetic labels are amazing! Do you need an audio labelling machine? Audio Flamingo checkpoints are available on github.com/NVIDIA/audio-f… ...and pre-training with synthetic labels from Audio Flamingo gives large improvements in text-to-audio models arxiv.org/abs/2406.15487

8.0K

Alexander H. Liu@alex_h_liu · Apr 16, 2024

Looking forward to meeting friends at #ICASSP2024

1.0K

Alexander H. Liu Retweeted

Rafael Valle@RafaelValleArt · Feb 28, 2024

Beautiful work by Alex Liu on generative pre-training for speech with Flow Matching. I just realized it's one of the main components in AudioBox! arxiv.org/abs/2310.16338

3.0K

Alexander H. Liu Retweeted

Hung-yi Lee (李宏毅)@HungyiLee2 · Feb 25, 2024

Recent years have witnessed significant developments in audio codec models (an overview figure from arxiv.org/abs/2402.13236). We introduce Codec-SUPERB (arxiv.org/abs/2402.13071) to boost fair and comprehensive comparison. Leaderboard: codecsuperb.com

117

12.0K

Alexander H. Liu@alex_h_liu · Feb 9, 2024

Visiting @WavLab was OWSM

2.0K

Alexander H. Liu@alex_h_liu · Dec 19, 2023

Lin-Shan: if no one asked you to attend the closing ceremony, you’re probably not getting the award (and laughed out loud)

WWen-Chin Huang@unilightwf · Dec 18, 2023

Prof. Lin-Shan Lee remembers all his students… amazing…

2.0K

Alexander H. Liu Retweeted

Yuan Gong@YGongND · Dec 10, 2023

LTU and LTU-AS codes are released. As usual, it is a full release including training and inference code, pretrained checkpoint, and the datasets. We hope these would be useful. Check github.com/YuanGongND/ltu.

3.0K

Alexander H. Liu Retweeted

Shinji Watanabe@shinjiw_at_cmu · Dec 6, 2023

I'll have a keynote talk at ASRU'23! asru2023.org/motion.asp?sit… See you soon in Taiwan! Actually, ASRU was the first conference that rejected my first-author paper (in 2003). But 20 years later, I was given the opportunity to be a keynote speaker, haha.

107

6.0K

Alexander H. Liu Retweeted

Shinji Watanabe@shinjiw_at_cmu · Nov 2, 2023

We summarize our lab's activities toward speech foundation models at wavlab.org/activities/202…. We have several other ongoing activities, and they are selected papers presented at ASRU.

21.0K

Alexander H. Liu Retweeted

Yuan Gong@YGongND · Aug 12, 2023

🚀 Our upgraded audio large language model LTU-2 is now hosted on HuggingFace Space at lnkd.in/eJDpsBY4. Please have a try and let us know what you think 😀 .

1.0K

Alexander H. Liu Retweeted

Andrew Rouditchenko 🇺🇦@arouditchenko · May 23, 2023

🗣️ Whisper is great for speech recognition, but it only recognizes ~100 languages. What if it wasn't trained on the language that you speak? Happy to introduce my #INTERSPEECH2023 paper comparing Whisper and XLS-R for adaption to unseen languages! arxiv.org/abs/2305.12606

8.0K