{"ID":2854639,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.14664","arxiv_id":"2510.14664","title":"SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation","abstract":"Generative speech technologies are progressing rapidly, but evaluating the perceptual quality of synthetic speech remains a core challenge. Existing methods typically rely on scalar scores or binary decisions, which lack interpretability and generalization across tasks and languages. We present SpeechLLM-as-Judges, a new paradigm for enabling large language models (LLMs) to conduct structured and explanation-based speech quality evaluation. To support this direction, we introduce SpeechEval, a large-scale dataset containing 32,207 multilingual speech clips and 128,754 annotations spanning four tasks: quality assessment, pairwise comparison, improvement suggestion, and deepfake detection. Based on this resource, we develop SQ-LLM, a speech-quality-aware LLM trained with chain-of-thought reasoning and reward optimization to improve capability. Experimental results show that SQ-LLM delivers strong performance across tasks and languages, revealing the potential of this paradigm for advancing speech quality evaluation. The relevant code, models, and data are publicly available at https://github.com/NKU-HLT/SpeechLLM-as-Judges.","short_abstract":"Generative speech technologies are progressing rapidly, but evaluating the perceptual quality of synthetic speech remains a core challenge. Existing methods typically rely on scalar scores or binary decisions, which lack interpretability and generalization across tasks and languages. We present SpeechLLM-as-Judges, a n...","url_abs":"https://arxiv.org/abs/2510.14664","url_pdf":"https://arxiv.org/pdf/2510.14664v2","authors":"[\"Hui Wang\",\"Jinghua Zhao\",\"Yifan Yang\",\"Shujie Liu\",\"Junyang Chen\",\"Yanzhe Zhang\",\"Shiwan Zhao\",\"Jinyu Li\",\"Jiaming Zhou\",\"Haoqin Sun\",\"Yan Lu\",\"Yong Qin\"]","published":"2025-10-16T13:19:07Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608177,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2854639,"paper_url":"https://arxiv.org/abs/2510.14664","paper_title":"SpeechLLM-as-Judges: Towards General and Interpretable Speech Quality Evaluation","repo_url":"https://github.com/NKU-HLT/SpeechLLM-as-Judges","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}