{"ID":2827900,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22165","arxiv_id":"2512.22165","title":"Marco-ASR: A Principled and Metric-Driven Framework for Fine-Tuning Large-Scale ASR Models for Domain Adaptation","abstract":"Automatic Speech Recognition (ASR) models have achieved remarkable accuracy in general settings, yet their performance often degrades in domain-specific applications due to data mismatch and linguistic variability. This challenge is amplified for modern Large Language Model (LLM)-based ASR systems, whose massive scale and complex training dynamics make effective fine-tuning non-trivial. To address this gap, this paper proposes a principled and metric-driven fine-tuning framework for adapting both traditional and LLM-based ASR models to specialized domains. The framework emphasizes learning rate optimization based on performance metrics, combined with domain-specific data transformation and augmentation. We empirically evaluate our framework on state-of-the-art models, including Whisper, Whisper-Turbo, and Qwen2-Audio, across multi-domain, multilingual, and multi-length datasets. Our results not only validate the proposed framework but also establish practical protocols for improving domain-specific ASR performance while preventing overfitting.","short_abstract":"Automatic Speech Recognition (ASR) models have achieved remarkable accuracy in general settings, yet their performance often degrades in domain-specific applications due to data mismatch and linguistic variability. This challenge is amplified for modern Large Language Model (LLM)-based ASR systems, whose massive scale...","url_abs":"https://arxiv.org/abs/2512.22165","url_pdf":"https://arxiv.org/pdf/2512.22165v1","authors":"[\"Xuanfan Ni\",\"Fei Yang\",\"Fengping Tian\",\"Qingjuan Li\",\"Chenyang Lyu\",\"Yichao Du\",\"Longyue Wang\",\"Weihua Luo\",\"Kaifu Zhang\"]","published":"2025-12-17T07:31:34Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
