{"ID":2875891,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.01401","arxiv_id":"2509.01401","title":"ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition","abstract":"Speech emotion recognition is vital for human-computer interaction, particularly for low-resource languages like Arabic, which face challenges due to limited data and research. We introduce ArabEmoNet, a lightweight architecture designed to overcome these limitations and deliver state-of-the-art performance. Unlike previous systems relying on discrete MFCC features and 1D convolutions, which miss nuanced spectro-temporal patterns, ArabEmoNet uses Mel spectrograms processed through 2D convolutions, preserving critical emotional cues often lost in traditional methods. While recent models favor large-scale architectures with millions of parameters, ArabEmoNet achieves superior results with just 1 million parameters, 90 times smaller than HuBERT base and 74 times smaller than Whisper. This efficiency makes it ideal for resource-constrained environments. ArabEmoNet advances Arabic speech emotion recognition, offering exceptional performance and accessibility for real-world applications.","short_abstract":"Speech emotion recognition is vital for human-computer interaction, particularly for low-resource languages like Arabic, which face challenges due to limited data and research. We introduce ArabEmoNet, a lightweight architecture designed to overcome these limitations and deliver state-of-the-art performance. Unlike pre...","url_abs":"https://arxiv.org/abs/2509.01401","url_pdf":"https://arxiv.org/pdf/2509.01401v1","authors":"[\"Ali Abouzeid\",\"Bilal Elbouardi\",\"Mohamed Maged\",\"Shady Shehata\"]","published":"2025-09-01T11:51:38Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.CL\",\"eess.AS\"]","methods":"[\"Convolutional Neural Network\"]","has_code":false}
