arXiv Sound
@ArxivSound
Sound-related articles (http://cs.SD and http://eess.AS) on http://arxiv.org
[IMPORTANT] arXiv sound does not post some papers submitted to arXiv cs.SD or eess.AS. This is because they do not appear in the RSS of arXiv. We apologize for your inconvenience.
Miaomiao Gao, Xiaoxiao Xiang, Yiwen Guo, "Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge," arxiv.org/abs/2507.17288
Ryo Terashima, Yuma Shirahata, Masaya Kawamura, "SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch," arxiv.org/abs/2507.17208
Isha Pandey, Pranav Gaikwad, Amruta Parulekar, Ganesh Ramakrishnan, "Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages," arxiv.org/abs/2507.16875
Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen, "Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis," arxiv.org/abs/2504.10352
Xinwei Cao, Zijian Fan, Torbj{\o}rn Svendsen, Giampiero Salvi, "Segmentation-free Goodness of Pronunciation," arxiv.org/abs/2507.16838
Peter Plantinga, Jen-Kai Chen, Roozbeh Sattari, Mirco Ravanelli, Denise Klein, "From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease," arxiv.org/abs/2507.16836
Nima Yazdani, Ali Ansari, Aruj Mahajan, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, "Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems," arxiv.org/abs/2507.16835
Jordan Madden, Matthew Stone, Dimitri Johnson, Daniel Geddez, "Towards Robust Speech Recognition for Jamaican Patois Music Transcription," arxiv.org/abs/2507.16834
Peter Plantinga, Briac Cordelle, Dominique Lou\"er, Mirco Ravanelli, Denise Klein, "Does Language Matter for Early Detection of Parkinson's Disease from Speech?," arxiv.org/abs/2507.16832
Jinting Wang, Shan Yang, Li Liu, "UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation," arxiv.org/abs/2506.04134
Ekaterina Dmitrieva, Maksim Kaledin, "HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks," arxiv.org/abs/2503.17141
Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Mikyas T. Desta, Roy Fejgin, Rafael Valle, Jason Li, "Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance," arxiv.org/abs/2502.05236
Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li, "Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data," arxiv.org/abs/2507.17735
Piotr Masztalski, Micha{\l} Romaniuk, Jakub \.Zak, Mateusz Matuszewski, Konrad Kowalczyk, "Clustering-based hard negative sampling for supervised contrastive speaker verification," arxiv.org/abs/2507.17540
Shanbo Cheng, Yu Bao, et al., "Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice,", arxiv.org/abs/2507.17527
Xiaoran Xua, In-Ho Rab, Ravi Sankarc, "Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning," arxiv.org/abs/2507.16845
Daiqi Liu, Tom\'as Arias-Vergara, Jana Hutter, Andreas Maier, Paula Andrea P\'erez-Toro, "Audio-Vision Contrastive Learning for Phonological Class Recognition," arxiv.org/abs/2507.17682
Qing Wang, Zehan Li, Hang Lv, Hongjie Chen, Yaodong Song, Jian Kang, Jie Lian, Jie Li, Yongxiang Li, Zhongjiang He, Xuelong Li, "BoSS: Beyond-Semantic Speech," arxiv.org/abs/2507.17563
Milena Davudova, Ziyuan Cai, Valentina Giunchiglia, Dragos C. Gruia, Giulia Sanguedolce, Adam Hampshire, Fatemeh Geranmayeh, "Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task," arxiv.org/abs/2507.17326
Tobias Morocutti, Jonathan Greif, Paul Primus, Florian Schmid, Gerhard Widmer, "On Temporal Guidance and Iterative Refinement in Audio Source Separation," arxiv.org/abs/2507.17297