{"ID":2879731,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.15316","arxiv_id":"2508.15316","title":"CUPE: Contextless Universal Phoneme Encoder for Language-Agnostic Speech Processing","abstract":"Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 milliseconds, about one phoneme's length. CUPE processes short, fixed-width windows independently and, despite fewer parameters than current approaches, achieves competitive cross-lingual performance by learning fundamental acoustic patterns common to all languages. Our extensive evaluation through supervised and self-supervised training on diverse languages, including zero-shot tests on the UCLA Phonetic Corpus, demonstrates strong cross-lingual generalization and reveals that effective universal speech processing is possible through modeling basic acoustic patterns within phoneme-length windows.","short_abstract":"Universal phoneme recognition typically requires analyzing long speech segments and language-specific patterns. Many speech processing tasks require pure phoneme representations free from contextual influence, which motivated our development of CUPE - a lightweight model that captures key phoneme features in just 120 m...","url_abs":"https://arxiv.org/abs/2508.15316","url_pdf":"https://arxiv.org/pdf/2508.15316v1","authors":"[\"Abdul Rehman\",\"Jian-Jun Zhang\",\"Xiaosong Yang\"]","published":"2025-08-21T07:27:10Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.LG\",\"eess.AS\"]","methods":"[]","has_code":false}
