{"ID":2869429,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.15095","arxiv_id":"2509.15095","title":"Listening, Imagining \u0026 Refining: A Heuristic Optimized ASR Correction Framework with LLMs","abstract":"Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a \"Listening-Imagining-Refining\" strategy, generating phonetic variants and refining them in context. A heuristic optimization with finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima and rule-based constraints help maintain semantic fidelity. Experiments on both English and Chinese ASR outputs show that LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.","short_abstract":"Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a \"Listening-Imagining-Refining\" strategy, generating phonetic...","url_abs":"https://arxiv.org/abs/2509.15095","url_pdf":"https://arxiv.org/pdf/2509.15095v2","authors":"[\"Yutong Liu\",\"Ziyue Zhang\",\"Cheng Huang\",\"Yongbin Yu\",\"Xiangxiang Wang\",\"Yuqing Cai\",\"Nyima Tashi\"]","published":"2025-09-18T15:50:54Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\"]","methods":"[\"Large Language Model\"]","has_code":false}
