{"ID":2877251,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.20916","arxiv_id":"2508.20916","title":"SageLM: A Multi-aspect and Explainable Large Language Model for Speech Judgement","abstract":"Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose \\texttt{SageLM}, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive S2S LLMs evaluation. First, unlike cascaded approaches that disregard acoustic features, SageLM jointly assesses both semantic and acoustic dimensions. Second, it leverages rationale-based supervision to enhance explainability and guide model learning, achieving superior alignment with evaluation outcomes compared to rule-based reinforcement learning methods. Third, we introduce \\textit{SpeechFeedback}, a synthetic preference dataset, and employ a two-stage training paradigm to mitigate the scarcity of speech preference data. Trained on both semantic and acoustic dimensions, SageLM achieves an 82.79\\% agreement rate with human evaluators, outperforming cascaded and SLM-based baselines by at least 7.42\\% and 26.20\\%, respectively.","short_abstract":"Speech-to-Speech (S2S) Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling end-to-end spoken dialogue systems. However, evaluating these models remains a fundamental challenge. We propose \\texttt{SageLM}, an end-to-end, multi-aspect, and explainable speech LLM for comprehensive...","url_abs":"https://arxiv.org/abs/2508.20916","url_pdf":"https://arxiv.org/pdf/2508.20916v2","authors":"[\"Yuan Ge\",\"Junxiang Zhang\",\"Xiaoqian Liu\",\"Bei Li\",\"Xiangnan Ma\",\"Chenglong Wang\",\"Kaiyang Ye\",\"Yangfan Du\",\"Linfeng Zhang\",\"Yuxin Huang\",\"Tong Xiao\",\"Zhengtao Yu\",\"JingBo Zhu\"]","published":"2025-08-28T15:47:37Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
