{"ID":2870542,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.13395","arxiv_id":"2509.13395","title":"TICL: Text-Embedding KNN For Speech In-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models","abstract":"Speech foundation models have recently demonstrated the ability to perform Speech In-Context Learning (SICL). Selecting effective in-context examples is crucial for SICL performance, yet selection methodologies remain underexplored. In this work, we propose Text-Embedding KNN for SICL (TICL), a simple pipeline that uses semantic context to enhance off-the-shelf large multimodal models' speech recognition ability without fine-tuning. Across challenging automatic speech recognition tasks, including accented English, multilingual speech, and children's speech, our method enables models to surpass zero-shot performance with up to 84.7% relative WER reduction. We conduct ablation studies to show the robustness and efficiency of our method.","short_abstract":"Speech foundation models have recently demonstrated the ability to perform Speech In-Context Learning (SICL). Selecting effective in-context examples is crucial for SICL performance, yet selection methodologies remain underexplored. In this work, we propose Text-Embedding KNN for SICL (TICL), a simple pipeline that use...","url_abs":"https://arxiv.org/abs/2509.13395","url_pdf":"https://arxiv.org/pdf/2509.13395v1","authors":"[\"Haolong Zheng\",\"Yekaterina Yegorova\",\"Mark Hasegawa-Johnson\"]","published":"2025-09-16T17:07:23Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\",\"cs.LG\",\"cs.MM\"]","methods":"[]","has_code":false}
