{"ID":2842159,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.10090","arxiv_id":"2511.10090","title":"ELYADATA \u0026 LIA at NADI 2025: ASR and ADI Subtasks","abstract":"This paper describes Elyadata \\\u0026 LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subtask among all participants. Our ADI system is a fine-tuned Whisper-large-v3 encoder with data augmentation. This system obtained the highest ADI accuracy score of \\textbf{79.83\\%} on the official test set. For multi-dialectal Arabic ASR, we fine-tuned SeamlessM4T-v2 Large (Egyptian variant) separately for each of the eight considered dialects. Overall, we obtained an average WER and CER of \\textbf{38.54\\%} and \\textbf{14.53\\%}, respectively, on the test set. Our results demonstrate the effectiveness of large pre-trained speech models with targeted fine-tuning for Arabic speech processing.","short_abstract":"This paper describes Elyadata \\\u0026 LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subt...","url_abs":"https://arxiv.org/abs/2511.10090","url_pdf":"https://arxiv.org/pdf/2511.10090v1","authors":"[\"Haroun Elleuch\",\"Youssef Saidi\",\"Salima Mdhaffar\",\"Yannick Estève\",\"Fethi Bougares\"]","published":"2025-11-13T08:44:39Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
