{"ID":2865267,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22220","arxiv_id":"2509.22220","title":"StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs","abstract":"Prevalent semantic speech tokenizers, designed to capture linguistic content, are surprisingly fragile. We find they are not robust to meaning-irrelevant acoustic perturbations; even at high Signal-to-Noise Ratios (SNRs) where speech is perfectly intelligible, their output token sequences can change drastically, increasing the learning burden for downstream LLMs. This instability stems from two flaws: a brittle single-path quantization architecture and a distant training signal indifferent to intermediate token stability. To address this, we introduce StableToken, a tokenizer that achieves stability through a consensus-driven mechanism. Its multi-branch architecture processes audio in parallel, and these representations are merged via a powerful bit-wise voting mechanism to form a single, stable token sequence. StableToken sets a new state-of-the-art in token stability, drastically reducing Unit Edit Distance (UED) under diverse noise conditions. This foundational stability translates directly to downstream benefits, significantly improving the robustness of SpeechLLMs on a variety of tasks. Our code and model are publicly available at https://github.com/Tencent/StableToken.","short_abstract":"Prevalent semantic speech tokenizers, designed to capture linguistic content, are surprisingly fragile. We find they are not robust to meaning-irrelevant acoustic perturbations; even at high Signal-to-Noise Ratios (SNRs) where speech is perfectly intelligible, their output token sequences can change drastically, increa...","url_abs":"https://arxiv.org/abs/2509.22220","url_pdf":"https://arxiv.org/pdf/2509.22220v2","authors":"[\"Yuhan Song\",\"Linhao Zhang\",\"Chuhan Wu\",\"Aiwei Liu\",\"Wei Jia\",\"Houfeng Wang\",\"Xiao Zhou\"]","published":"2025-09-26T11:32:51Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":609251,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2865267,"paper_url":"https://arxiv.org/abs/2509.22220","paper_title":"StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs","repo_url":"https://github.com/Tencent/StableToken","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
