{"ID":2842734,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.11691","arxiv_id":"2511.11691","title":"Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues","abstract":"Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram regions but fail to show whether these regions correspond to meaningful acoustic markers of emotion, limiting faithfulness and interpretability. We propose a framework that overcomes these limitations by quantifying the magnitudes of cues within salient regions. This clarifies \"what\" is highlighted and connects it to \"why\" it matters, linking saliency to expert-referenced acoustic cues of speech emotions. Experiments on benchmark SER datasets show that our approach improves explanation quality by explicitly linking salient regions to theory-driven speech emotions expert-referenced acoustics. Compared to standard saliency methods, it provides more understandable and plausible explanations of SER models, offering a foundational step towards trustworthy speech-based affective computing.","short_abstract":"Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram regions but fail to show whether these regions correspond to meaningful acoustic markers of emotion, limiting faithfulness and in...","url_abs":"https://arxiv.org/abs/2511.11691","url_pdf":"https://arxiv.org/pdf/2511.11691v1","authors":"[\"Seham Nasr\",\"Zhao Ren\",\"David Johnson\"]","published":"2025-11-12T09:40:36Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.SD\"]","methods":"[]","has_code":false}
