{"ID":2882052,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.11771","arxiv_id":"2508.11771","title":"Investigating Transcription Normalization in the Faetar ASR Benchmark","abstract":"We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in the task. We also demonstrate that bigram word-based language modelling is of no added benefit, but that constraining decoding to a finite lexicon can be beneficial. The task remains extremely difficult.","short_abstract":"We examine the role of transcription inconsistencies in the Faetar Automatic Speech Recognition benchmark, a challenging low-resource ASR benchmark. With the help of a small, hand-constructed lexicon, we conclude that find that, while inconsistencies do exist in the transcriptions, they are not the main challenge in th...","url_abs":"https://arxiv.org/abs/2508.11771","url_pdf":"https://arxiv.org/pdf/2508.11771v2","authors":"[\"Leo Peckham\",\"Michael Ong\",\"Naomi Nagy\",\"Ewan Dunbar\"]","published":"2025-08-15T18:41:25Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}