{"ID":2836092,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22687","arxiv_id":"2511.22687","title":"PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning","abstract":"Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framework that guides multi-stage quantization using a pre-trained speech enhancement model. The first quantization stage reconstructs low-entropy, denoised speech embeddings, while subsequent stages encode residual high-entropy components. This design improves training stability significantly. Experiments demonstrate that PURE consistently outperforms conventional RVQ-based codecs in reconstruction and downstream speech language model-based text-to-speech, particularly under noisy training conditions.","short_abstract":"Neural speech codecs have achieved strong performance in low-bitrate compression, but residual vector quantization (RVQ) often suffers from unstable training and ineffective decomposition, limiting reconstruction quality and efficiency. We propose PURE Codec (Progressive Unfolding of Residual Entropy), a novel framewor...","url_abs":"https://arxiv.org/abs/2511.22687","url_pdf":"https://arxiv.org/pdf/2511.22687v1","authors":"[\"Jiatong Shi\",\"Haoran Wang\",\"William Chen\",\"Chenda Li\",\"Wangyou Zhang\",\"Jinchuan Tian\",\"Shinji Watanabe\"]","published":"2025-11-27T18:40:08Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[\"Language Model\"]","has_code":false}
