{"ID":2885891,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06321","arxiv_id":"2508.06321","title":"EmoAugNet: A Signal-Augmented Hybrid CNN-LSTM Framework for Speech Emotion Recognition","abstract":"Recognizing emotional signals in speech has a significant impact on enhancing the effectiveness of human-computer interaction (HCI). This study introduces EmoAugNet, a hybrid deep learning framework, that incorporates Long Short-Term Memory (LSTM) layers with one-dimensional Convolutional Neural Networks (1D-CNN) to enable reliable Speech Emotion Recognition (SER). The quality and variety of the features that are taken from speech signals have a significant impact on how well SER systems perform. A comprehensive speech data augmentation strategy was used to combine both traditional methods, such as noise addition, pitch shifting, and time stretching, with a novel combination-based augmentation pipeline to enhance generalization and reduce overfitting. Each audio sample was transformed into a high-dimensional feature vector using root mean square energy (RMSE), Mel-frequency Cepstral Coefficient (MFCC), and zero-crossing rate (ZCR). Our model with ReLU activation has a weighted accuracy of 95.78\\% and unweighted accuracy of 92.52\\% on the IEMOCAP dataset and, with ELU activation, has a weighted accuracy of 96.75\\% and unweighted accuracy of 91.28\\%. On the RAVDESS dataset, we get a weighted accuracy of 94.53\\% and 94.98\\% unweighted accuracy for ReLU activation and 93.72\\% weighted accuracy and 94.64\\% unweighted accuracy for ELU activation. These results highlight EmoAugNet's effectiveness in improving the robustness and performance of SER systems through integated data augmentation and hybrid modeling.","short_abstract":"Recognizing emotional signals in speech has a significant impact on enhancing the effectiveness of human-computer interaction (HCI). This study introduces EmoAugNet, a hybrid deep learning framework, that incorporates Long Short-Term Memory (LSTM) layers with one-dimensional Convolutional Neural Networks (1D-CNN) to en...","url_abs":"https://arxiv.org/abs/2508.06321","url_pdf":"https://arxiv.org/pdf/2508.06321v1","authors":"[\"Durjoy Chandra Paul\",\"Gaurob Saha\",\"Md Amjad Hossain\"]","published":"2025-08-06T16:28:27Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.HC\",\"cs.LG\"]","methods":"[\"Convolutional Neural Network\"]","has_code":false}
