{"ID":2893247,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.14346","arxiv_id":"2507.14346","title":"Towards Accurate Phonetic Error Detection Through Phoneme Similarity Modeling","abstract":"Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current models failing to capture these discrepancies effectively. We propose a verbatim phoneme recognition framework using multi-task training with novel phoneme similarity modeling that transcribes what speakers actually say rather than what they're supposed to say. We develop and open-source \\textit{VCTK-accent}, a simulated dataset containing phonetic errors, and propose two novel metrics for assessing pronunciation differences. Our work establishes a new benchmark for phonetic error detection.","short_abstract":"Phonetic error detection, a core subtask of automatic pronunciation assessment, identifies pronunciation deviations at the phoneme level. Speech variability from accents and dysfluencies challenges accurate phoneme recognition, with current models failing to capture these discrepancies effectively. We propose a verbati...","url_abs":"https://arxiv.org/abs/2507.14346","url_pdf":"https://arxiv.org/pdf/2507.14346v1","authors":"[\"Xuanru Zhou\",\"Jiachen Lian\",\"Cheol Jun Cho\",\"Tejas Prabhune\",\"Shuhe Li\",\"William Li\",\"Rodrigo Ortiz\",\"Zoe Ezzes\",\"Jet Vonk\",\"Brittany Morin\",\"Rian Bogley\",\"Lisa Wauters\",\"Zachary Miller\",\"Maria Gorno-Tempini\",\"Gopala Anumanchipalli\"]","published":"2025-07-18T19:51:56Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[]","has_code":false}