{"ID":2826703,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.18263","arxiv_id":"2512.18263","title":"TICL+: A Case Study On Speech In-Context Learning for Children's Speech Recognition","abstract":"Children's speech recognition remains challenging due to substantial acoustic and linguistic variability, limited labeled data, and significant differences from adult speech. Speech foundation models can address these challenges through Speech In-Context Learning (SICL), allowing adaptation to new domains without fine-tuning. However, the effectiveness of SICL depends on how in-context examples are selected. We extend an existing retrieval-based method, Text-Embedding KNN for SICL (TICL), introducing an acoustic reranking step to create TICL+. This extension prioritizes examples that are both semantically and acoustically aligned with the test input. Experiments on four children's speech corpora show that TICL+ achieves up to a 53.3% relative word error rate reduction over zero-shot performance and 37.6% over baseline TICL, highlighting the value of combining semantic and acoustic information for robust, scalable ASR in children's speech.","short_abstract":"Children's speech recognition remains challenging due to substantial acoustic and linguistic variability, limited labeled data, and significant differences from adult speech. Speech foundation models can address these challenges through Speech In-Context Learning (SICL), allowing adaptation to new domains without fine-...","url_abs":"https://arxiv.org/abs/2512.18263","url_pdf":"https://arxiv.org/pdf/2512.18263v1","authors":"[\"Haolong Zheng\",\"Yekaterina Yegorova\",\"Mark Hasegawa-Johnson\"]","published":"2025-12-20T08:03:07Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[]","has_code":false}
