{"ID":2863001,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26634","arxiv_id":"2509.26634","title":"Scaling Spoken Language Models with Syllabic Speech Tokenization","abstract":"Spoken language models (SLMs) typically discretize speech into high-frame-rate tokens extracted from SSL speech models. As the most successful LMs are based on the Transformer architecture, processing these long token streams with self-attention is expensive, as attention scales quadratically with sequence length. A recent SSL work introduces acoustic tokenization of speech at the syllable level, which is more interpretable and potentially more scalable with significant compression in token lengths (4-5 Hz). Yet, their value for spoken language modeling is not yet fully explored. We present the first systematic study of syllabic tokenization for spoken language modeling, evaluating models on a suite of SLU benchmarks while varying training data scale. Syllabic tokens can match or surpass the previous high-frame rate tokens while significantly cutting training and inference costs, achieving more than a 2x reduction in training time and a 5x reduction in FLOPs. Our findings highlight syllable-level language modeling as a promising path to efficient long-context spoken language models.","short_abstract":"Spoken language models (SLMs) typically discretize speech into high-frame-rate tokens extracted from SSL speech models. As the most successful LMs are based on the Transformer architecture, processing these long token streams with self-attention is expensive, as attention scales quadratically with sequence length. A re...","url_abs":"https://arxiv.org/abs/2509.26634","url_pdf":"https://arxiv.org/pdf/2509.26634v2","authors":"[\"Nicholas Lee\",\"Cheol Jun Cho\",\"Alan W Black\",\"Gopala K. Anumanchipalli\"]","published":"2025-09-30T17:59:09Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"eess.AS\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}
