{"ID":2857388,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09072","arxiv_id":"2510.09072","title":"Emotion-Disentangled Embedding Alignment for Noise-Robust and Cross-Corpus Speech Emotion Recognition","abstract":"Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech emotion recognition models through improved representation learning. First, our model employs EDRL (Emotion-Disentangled Representation Learning) to extract class-specific discriminative features while preserving shared similarities across emotion categories. Next, MEA (Multiblock Embedding Alignment) refines these representations by projecting them into a joint discriminative latent subspace that maximizes covariance with the original speech input. The learned EDRL-MEA embeddings are subsequently used to train an emotion classifier using clean samples from publicly available datasets, and are evaluated on unseen noisy and cross-corpus speech samples. Improved performance under these challenging conditions demonstrates the effectiveness of the proposed method.","short_abstract":"Effectiveness of speech emotion recognition in real-world scenarios is often hindered by noisy environments and variability across datasets. This paper introduces a two-step approach to enhance the robustness and generalization of speech emotion recognition models through improved representation learning. First, our mo...","url_abs":"https://arxiv.org/abs/2510.09072","url_pdf":"https://arxiv.org/pdf/2510.09072v1","authors":"[\"Upasana Tiwari\",\"Rupayan Chakraborty\",\"Sunil Kumar Kopparapu\"]","published":"2025-10-10T07:17:07Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.HC\",\"cs.LG\",\"eess.AS\"]","methods":"[]","has_code":false}