{"ID":2875004,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.03256","arxiv_id":"2509.03256","title":"Comparison of End-to-end Speech Assessment Models for the NOCASA 2025 Challenge","abstract":"This paper presents an analysis of three end-to-end models developed for the NOCASA 2025 Challenge, aimed at automatic word-level pronunciation assessment for children learning Norwegian as a second language. Our models include an encoder-decoder Siamese architecture (E2E-R), a prefix-tuned direct classification model leveraging pretrained wav2vec2.0 representations, and a novel model integrating alignment-free goodness-of-pronunciation (GOP) features computed via CTC. We introduce a weighted ordinal cross-entropy loss tailored for optimizing metrics such as unweighted average recall and mean absolute error. Among the explored methods, our GOP-CTC-based model achieved the highest performance, substantially surpassing challenge baselines and attaining top leaderboard scores.","short_abstract":"This paper presents an analysis of three end-to-end models developed for the NOCASA 2025 Challenge, aimed at automatic word-level pronunciation assessment for children learning Norwegian as a second language. Our models include an encoder-decoder Siamese architecture (E2E-R), a prefix-tuned direct classification model...","url_abs":"https://arxiv.org/abs/2509.03256","url_pdf":"https://arxiv.org/pdf/2509.03256v1","authors":"[\"Aleksei Žavoronkov\",\"Tanel Alumäe\"]","published":"2025-09-03T12:17:59Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false}
