arXiv Sound

@ArxivSound

Sound-related articles (http://cs.SD and http://eess.AS) on http://arxiv.org

Joined July 2020

32Following

6KFollowers

Pinned

arXiv Sound@ArxivSound · Oct 20, 2022

[IMPORTANT] arXiv sound does not post some papers submitted to arXiv cs.SD or eess.AS. This is because they do not appear in the RSS of arXiv. We apologize for your inconvenience.

arXiv Sound@ArxivSound · Jul 25

Miaomiao Gao, Xiaoxiao Xiang, Yiwen Guo, "Triple X: A LLM-Based Multilingual Speech Recognition System for the INTERSPEECH2025 MLC-SLM Challenge," arxiv.org/abs/2507.17288

467

arXiv Sound@ArxivSound · Jul 25

Ryo Terashima, Yuma Shirahata, Masaya Kawamura, "SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch," arxiv.org/abs/2507.17208

746

arXiv Sound@ArxivSound · Jul 25

Isha Pandey, Pranav Gaikwad, Amruta Parulekar, Ganesh Ramakrishnan, "Technical report: Impact of Duration Prediction on Speaker-specific TTS for Indian Languages," arxiv.org/abs/2507.16875

648

arXiv Sound@ArxivSound · Jul 25

Yifan Yang, Shujie Liu, Jinyu Li, Yuxuan Hu, Haibin Wu, Hui Wang, Jianwei Yu, Lingwei Meng, Haiyang Sun, Yanqing Liu, Yan Lu, Kai Yu, Xie Chen, "Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis," arxiv.org/abs/2504.10352

840

arXiv Sound@ArxivSound · Jul 25

Xinwei Cao, Zijian Fan, Torbj{\o}rn Svendsen, Giampiero Salvi, "Segmentation-free Goodness of Pronunciation," arxiv.org/abs/2507.16838

356

arXiv Sound@ArxivSound · Jul 25

Peter Plantinga, Jen-Kai Chen, Roozbeh Sattari, Mirco Ravanelli, Denise Klein, "From Black Box to Biomarker: Sparse Autoencoders for Interpreting Speech Models of Parkinson's Disease," arxiv.org/abs/2507.16836

322

arXiv Sound@ArxivSound · Jul 25

Nima Yazdani, Ali Ansari, Aruj Mahajan, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, "Evaluating Speech-to-Text x LLM x Text-to-Speech Combinations for AI Interview Systems," arxiv.org/abs/2507.16835

575

arXiv Sound@ArxivSound · Jul 25

Jordan Madden, Matthew Stone, Dimitri Johnson, Daniel Geddez, "Towards Robust Speech Recognition for Jamaican Patois Music Transcription," arxiv.org/abs/2507.16834

350

arXiv Sound@ArxivSound · Jul 25

Peter Plantinga, Briac Cordelle, Dominique Lou\"er, Mirco Ravanelli, Denise Klein, "Does Language Matter for Early Detection of Parkinson's Disease from Speech?," arxiv.org/abs/2507.16832

276

arXiv Sound@ArxivSound · Jul 25

Jinting Wang, Shan Yang, Li Liu, "UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation," arxiv.org/abs/2506.04134

318

arXiv Sound@ArxivSound · Jul 25

Ekaterina Dmitrieva, Maksim Kaledin, "HiFi-Stream: Streaming Speech Enhancement with Generative Adversarial Networks," arxiv.org/abs/2503.17141

362

arXiv Sound@ArxivSound · Jul 25

Shehzeen Hussain, Paarth Neekhara, Xuesong Yang, Edresson Casanova, Subhankar Ghosh, Mikyas T. Desta, Roy Fejgin, Rafael Valle, Jason Li, "Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance," arxiv.org/abs/2502.05236

575

arXiv Sound@ArxivSound · Jul 25

Qibing Bai, Sho Inoue, Shuai Wang, Zhongjie Jiang, Yannan Wang, Haizhou Li, "Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data," arxiv.org/abs/2507.17735

793

arXiv Sound@ArxivSound · Jul 25

Piotr Masztalski, Micha{\l} Romaniuk, Jakub \.Zak, Mateusz Matuszewski, Konrad Kowalczyk, "Clustering-based hard negative sampling for supervised contrastive speaker verification," arxiv.org/abs/2507.17540

312

arXiv Sound@ArxivSound · Jul 25

Shanbo Cheng, Yu Bao, et al., "Seed LiveInterpret 2.0: End-to-end Simultaneous Speech-to-speech Translation with Your Voice,", arxiv.org/abs/2507.17527

592

arXiv Sound@ArxivSound · Jul 25

Xiaoran Xua, In-Ho Rab, Ravi Sankarc, "Enhancing Lung Disease Diagnosis via Semi-Supervised Machine Learning," arxiv.org/abs/2507.16845

259

arXiv Sound@ArxivSound · Jul 25

Daiqi Liu, Tom\'as Arias-Vergara, Jana Hutter, Andreas Maier, Paula Andrea P\'erez-Toro, "Audio-Vision Contrastive Learning for Phonological Class Recognition," arxiv.org/abs/2507.17682

267

arXiv Sound@ArxivSound · Jul 25

Qing Wang, Zehan Li, Hang Lv, Hongjie Chen, Yaodong Song, Jian Kang, Jie Lian, Jie Li, Yongxiang Li, Zhongjiang He, Xuelong Li, "BoSS: Beyond-Semantic Speech," arxiv.org/abs/2507.17563

578

arXiv Sound@ArxivSound · Jul 25

Milena Davudova, Ziyuan Cai, Valentina Giunchiglia, Dragos C. Gruia, Giulia Sanguedolce, Adam Hampshire, Fatemeh Geranmayeh, "Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task," arxiv.org/abs/2507.17326

324

arXiv Sound@ArxivSound · Jul 25

Tobias Morocutti, Jonathan Greif, Paul Primus, Florian Schmid, Gerhard Widmer, "On Temporal Guidance and Iterative Refinement in Audio Source Separation," arxiv.org/abs/2507.17297

368