Wen-Chin Huang
@unilightwf
名古屋大学情報学研究科助教. Assistant professor, Nagoya University. Voice conversion & synthesis. Trilingual, street dancer, golfer. Tweets are my own opinions.
🚨I am honored to give an online invited talk at the Conversational AI Reading Group, MILA @convAI2024 on 5/15 11am-12pm EDT (5/16 0-1am Japan time), titled "Automatic Quality Assessment for Speech and Beyond"! Please find more info on the website: poonehmousavi.github.io/rg

🚀 We just released Sidon — a multilingual speech restoration model built on the Miipher & Miipher-2 resynthesis framework! Trained on 103 languages and robust to real-world artifacts like wind noise & packet loss 🌍 🔧 Try Sidon with your speech samples! huggingface.co/spaces/sarulab…
@chimechallenge ⭐⭐ We are happy to announce the release of the tasks for the 9th CHiME Speech Separation and Recognition Challenge (CHiME-9). ⚡⚡ Please visit the CHiME Challenge website for details chimechallenge.org ⚡⚡
To my ASR friends: what are some nice course materials/books that introduce ASR? Traditional, modern, either is fine! いいASR教材/資料があれば教えてください!
I meant no disrespect — 100% happy to see all these papers — but SSW may have now become SEW (speech evaluation workshop) blogs.helsinki.fi/ssw13-2025/ful…
1/7 🔗 Introducing STITCH: our new method to make Spoken Language Models (SLMs) think and talk at the same time. Paper link 👉 arxiv.org/abs/2507.15375
ok here goes.... ✨Rebuttal writing advice thread✨ 💫No-one-asked edition💫 🌟Chapter commuting-on-the-bus🌟
🚀 We just released MSR-UTMOS — a powerful model for speech quality prediction that supports 16kHz, 24kHz, and 48kHz audio! 🔍 Powered by a sampling frequency-independent convolutional layer on top of SSL models. 🎧 Upload your own samples and try it now: huggingface.co/spaces/sarulab…
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder," arxiv.org/abs/2507.14129
筆頭論文 "Learning Separated Representations for Instrument-based Music Similarity" が、APSIPA Transactions on Signal and Information Processing に掲載されました! 音楽信号(混合音)から楽器(ステム)ごとの表現を抽出する深層距離学習 nowpublishers.com/article/Detail… ↑オープンアクセスです
Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition. arxiv.org/abs/2507.13626
記事を投稿しました! 科学技術論文に用いる図表のための matplotlib 設定 [Research] on #Qiita qiita.com/n-taishi/items…
One of my favorite moments at #ICML2025 was being able to witness @_albertgu and the @cartesia_ai team’s reaction to Mamba being on the coffee sign. Felt surreal seeing someone realize their cultural impact.
VoxATtack: A Multimodal Attack on Voice Anonymization Systems. arxiv.org/abs/2507.12081
分かりやすい!!!
陣内さんのNLPコロキウムでのトークを公開しています→ 📺youtu.be/lpjw1AgemWY 当日参加できなかったかたもぜひご覧ください! ※ なおQA・ディスカッションは公開しておりません。 またスライドも公開いただきました。あわせてご覧ください → jinnaiyuu.github.io/pdf/slides/Int…
Great, when VALL-E was out people were like “why model eight codebooks! Just use one!” And now people are saying using one codebook is bad :) arxiv.org/abs/2507.12197
“In speech quality estimation for speech enhancement (SE) systems, subjective listening tests so far are considered as the gold standard.” So true, but so not enforced in the community, no?
P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge. arxiv.org/abs/2507.11306
Thought it was a Japanese SQA dataset 😭😭😭😭
JSQA: Speech Quality Assessment with Perceptually-Inspired Contrastive Pretraining Based on JND Audio Pairs. arxiv.org/abs/2507.11636
🔥3年ぶり #京まふ 参戦決定🔥 #WewillB BD&DVD発売直前 ✩BOCCHI THE ROCK! Presents✩ 京まふ大作戦2025 9月21日(日)10:00~10:40 みやこめっせステージ ◆出演 #青山吉能 #水野朔 #長谷川育美 ▼詳細 kyomaf.kyoto #ぼっち・ざ・ろっく #京まふ2025