{"ID":2863507,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24613","arxiv_id":"2509.24613","title":"HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition","abstract":"Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessible non-synthetic evaluation framework for Korean-English CS, aiming to provide a means for the precise evaluation of multilingual ASR models and to foster research in the field. The proposed framework not only consists of high-quality, natural CS data across various topics, but also provides meticulous loanword labels and a hierarchical CS-level labeling scheme (word, phrase, and sentence) that together enable a systematic evaluation of a model's ability to handle each distinct level of code-switching. Through evaluations of diverse multilingual ASR models and fine-tuning experiments, this paper demonstrates that although most multilingual ASR models initially exhibit inadequate CS-ASR performance, this capability can be enabled through fine-tuning with synthetic CS data. HiKE is available at https://github.com/ThetaOne-AI/HiKE.","short_abstract":"Despite advances in multilingual automatic speech recognition (ASR), code-switching (CS), the mixing of languages within an utterance common in daily speech, remains a severely underexplored challenge. In this paper, we introduce HiKE: the Hierarchical Korean-English code-switching benchmark, the first globally accessi...","url_abs":"https://arxiv.org/abs/2509.24613","url_pdf":"https://arxiv.org/pdf/2509.24613v4","authors":"[\"Gio Paik\",\"Yongbeom Kim\",\"Soungmin Lee\",\"Sangmin Ahn\",\"Chanwoo Kim\"]","published":"2025-09-29T11:18:13Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false,"code_links":[{"ID":609019,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2863507,"paper_url":"https://arxiv.org/abs/2509.24613","paper_title":"HiKE: Hierarchical Evaluation Framework for Korean-English Code-Switching Speech Recognition","repo_url":"https://github.com/ThetaOne-AI/HiKE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
