{"ID":3083705,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T06:37:52.911886358Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06200","arxiv_id":"2606.06200","title":"Learning Emotion-discriminative Representations for Zero-Shot Cross-lingual Speech Emotion Recognition","abstract":"Zero-shot cross-lingual speech emotion recognition (SER) remains challenging due to distribution mismatches across languages and the lack of emotion annotations in target language. Under such conditions, models trained solely on source-language data frequently suffer from degraded generalization when evaluated on unseen target languages. To address this limitation, we propose an emotion-discriminative representation learning method that integrates supervised contrastive learning and speaker adversarial learning. The contrastive learning promotes cross-lingual emotion alignment, while speaker adversarial learning suppresses speaker-related cues to encourage speaker-invariant representations. Experimental results under a zero-shot cross-lingual SER setting demonstrate that the proposed method significantly improves SER performance over conventional training strategies.","short_abstract":"Zero-shot cross-lingual speech emotion recognition (SER) remains challenging due to distribution mismatches across languages and the lack of emotion annotations in target language. Under such conditions, models trained solely on source-language data frequently suffer from degraded generalization when evaluated on unsee...","url_abs":"https://arxiv.org/abs/2606.06200","url_pdf":"https://arxiv.org/pdf/2606.06200v1","authors":"[\"Jinyi Mi\",\"Ding Ma\",\"Tomoki Toda\"]","published":"2026-06-04T14:05:38Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false}