{"ID":2867485,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17349","arxiv_id":"2509.17349","title":"Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation","abstract":"Simultaneous speech-to-text translation systems must balance translation quality with latency. Although quality evaluation is well established, latency measurement remains a challenge. Existing metrics produce inconsistent results, especially in short-form settings with artificial presegmentation. We present the first comprehensive meta-evaluation of latency metrics across language pairs and systems. We uncover a structural bias in current metrics related to segmentation. We introduce YAAL (Yet Another Average Lagging) for a more accurate short-form evaluation and LongYAAL for unsegmented audio. We propose SoftSegmenter, a resegmentation tool based on soft word-level alignment. We show that YAAL and LongYAAL, together with SoftSegmenter, outperform popular latency metrics, enabling more reliable assessments of short- and long-form simultaneous speech translation systems. We implement all artifacts within the OmniSTEval toolkit: https://github.com/pe-trik/OmniSTEval.","short_abstract":"Simultaneous speech-to-text translation systems must balance translation quality with latency. Although quality evaluation is well established, latency measurement remains a challenge. Existing metrics produce inconsistent results, especially in short-form settings with artificial presegmentation. We present the first...","url_abs":"https://arxiv.org/abs/2509.17349","url_pdf":"https://arxiv.org/pdf/2509.17349v2","authors":"[\"Peter Polák\",\"Sara Papi\",\"Luisa Bentivogli\",\"Ondřej Bojar\"]","published":"2025-09-22T04:21:19Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":609468,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867485,"paper_url":"https://arxiv.org/abs/2509.17349","paper_title":"Better Late Than Never: Meta-Evaluation of Latency Metrics for Simultaneous Speech-to-Text Translation","repo_url":"https://github.com/pe-trik/OmniSTEval","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}