Pooneh Mousavi
@MousaviPooneh
“Ever tried. Ever failed. No matter. Try again. Fail again. Fail better.” Samuel Becket
Our pick of the week by @beomseok_lee_: "ALAS: Measuring Latent Speech-Text Alignment For Spoken Language Understanding In Multimodal LLMs" by Pooneh Mousavi, @yingzhi_wang, @mirco_ravanelli, and @CemSubakan (2025) arxiv.org/abs/2505.19937 #SLU #speech #multimodal #LLM
Speech-language models show promise in multimodal tasks—but how well are speech & text actually aligned? 🤔 This paper arxiv.org/abs/2505.19937 proposes a new metric to measure layer-wise correlation between the two, with a focus on SLU tasks. 🔍🗣️📄
📢 Join our Conversational AI Reading Group! 📅 Thursday, June 19th | 11 AM - 12 PM EST 🎙 Speaker: Yuki Mitsufuji (@mittu1204) - SonyAI 📖 Topic: "AI for Creators: Pushing Creative Abilities to the Next Level" 🔗 Details: (poonehmousavi.github.io/rg)
``Discrete Audio Tokens: More Than a Survey!,'' Pooneh Mousavi, Gallil Maimon, Adel Moumen, Darius Petermann, Jiatong Shi, Haibin Wu, Haici Yang, Anastasia Kuznetsova, Artem Ploujnikov, Ricard Marxer, Bhuvana Ramabhadran, Benjamin Elizalde, Loren Lugosch… ift.tt/GA4ZC6u
🎵💬 If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a ☕/🍺 and sit down for a read!
🌟🌟 Great collaboration, with a diverse all-star team led by @MousaviPooneh - check it out👇 📄Paper - arxiv.org/abs/2506.10274 🌐Website (+updating tokeniser DB!) - poonehmousavi.github.io/dates-website/
🚀 We're excited to announce our latest work: "Discrete Audio Tokens: More Than a Survey!" It presents a comprehensive survey and benchmark of audio tokenizers across speech, music, and general audio. preprint: arxiv.org/pdf/2506.10274 website: poonehmousavi.github.io/dates-website/
📢 Join our Conversational AI Reading Group! 📅 Thursday, June 12th | 11 AM - 12 PM EST 🎙 Speaker: Andros Tjandra 📖 Topic: "Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound" 🔗 Details: (poonehmousavi.github.io/rg)
📢 Join our Conversational AI Reading Group! 📅 Thursday, May 29th | 11 AM - 12 PM EST 🎙 Speaker: Yossi Adi @adiyossLC 📖 Topic: "On The Landscape of Spoken Language Models" 🔗 Details: (poonehmousavi.github.io/rg)
Learn about speaker diarization, the science behind it, and the future of diarization at @pyannoteAI research labs youtu.be/ECqxZgVevuI?fe…
... in which I'll talk about my decade-old love for speaker diarization and the loss functions used to train underlying neural networks
📢 Join our Conversational AI Reading Group! 📅 Thursday, May 22nd | 11 AM - 12 PM EST 🎙 Speaker: Hervé Bredin (@hbredin) 📖 Topic: "Speaker diarization, a (love) loss story" 🔗 Details: (poonehmousavi.github.io/rg)
🗣️🧠 Speech Language Models require lots of compute to train, right? In our new paper, we test is it possible to train an SLM on 1xA5000 gpu in 24 hours? The results may surprise you (they even surprised us)! Tips, open source resources, full paper 👇🏻
@convAI2024 Thank you for having me, and thank you all the listeners! I had a great time 🙌 If you missed it, here's the recording and the slides! Recording: youtube.com/watch?v=REH034… Slides: poonehmousavi.github.io/assets/slides/…
🚨I am honored to give an online invited talk at the Conversational AI Reading Group, MILA @convAI2024 on 5/15 11am-12pm EDT (5/16 0-1am Japan time), titled "Automatic Quality Assessment for Speech and Beyond"! Please find more info on the website: poonehmousavi.github.io/rg
📢 Join our Conversational AI Reading Group! 📅 Thursday, May 15th | 11 AM - 12 PM EST 🎙 Speaker: Wen-Chin Huang (@unilightwf) 📖 Topic: "Automatic Quality Assessment for Speech and Beyond" 🔗 Details: (poonehmousavi.github.io/rg) , (youtube.com/@CONVAI_RG)
📢 Join our Conversational AI Reading Group! 📅 Thursday, May 8th | 11 AM - 12 PM EST 🎙 Speaker: Leda Sari 📖 Topic: "The Voicebox Model and Its Applications" 🔗 Details: (poonehmousavi.github.io/rg)
We’re really excited to have Dan Povey join us for our next Conversational AI Reading Group. He is the creator of the Kaldi toolkit and author of many well-known papers. Don’t miss his talk!
📢 Join our Conversational AI Reading Group! 📅 Thursday, May 1st | 11 AM - 12 PM EST 🎙 Speaker: Daniel Povey from Xiaomi Corp. 📖 Topic: "CR-CTC: Consistency regularization on CTC for improved speech recognition" 🔗 Details: (poonehmousavi.github.io/rg)
🚨I am honored to give an online invited talk at the Conversational AI Reading Group, MILA @convAI2024 on 5/15 11am-12pm EDT (5/16 0-1am Japan time), titled "Automatic Quality Assessment for Speech and Beyond"! Please find more info on the website: poonehmousavi.github.io/rg
📢 Join our Conversational AI Reading Group! 📅 Thursday, April 24th | 11 AM - 12 PM EST 🎙 Speaker: Oriol Nieto(@urinieto) from Adobe Research 📖 Topic: "GenAI for Sound Design" 🔗 Details: (poonehmousavi.github.io/rg)
📢 Join our Conversational AI Reading Group! 📅 Thursday, April 17th | 11 AM - 12 PM EST 🎙 Speaker: Titouan Parcollet from Samsung AI Center Cambridge 📖 Topic: "Unsupervised on-device adaptation of a speech recogniser and the Pitfalls of "SpeechLLM" evaluation"
📢 Join our Conversational AI Reading Group! 📅 Thursday, April 10th | 11 AM - 12 PM EST 🎙Speaker: Karen Livescu from TTIC 📖 Topic: "Toward Understanding Sign Language in the Real World" 🔗 Details: (poonehmousavi.github.io/rg)
📢 Join our Conversational AI Reading Group! 📅 Thursday, April 3rd | 11 AM - 12 PM EST 🎙Speaker: Min Ma from Google DeepMind 📖 Topic: "Improving Multilingual Speech Recognition and Language Identification" 🔗 Details: (poonehmousavi.github.io/rg)