{"ID":2853072,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.16718","arxiv_id":"2510.16718","title":"U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation","abstract":"We propose \\textbf{U-Codec}, an \\textbf{U}ltra low frame-rate neural speech \\textbf{Codec} that achieves high-fidelity reconstruction and fast speech generation at an extremely low frame-rate of 5Hz (5 frames per second). Extreme compression at 5Hz typically leads to severe intelligibility and spectral detail loss, we introduce a Transformer-based inter-frame long-term dependency module and systematically explore residual vector quantization (RVQ) depth and codebook size to identify optimal configurations. Moreover, we apply U-Codec into a large language model (LLM)-based auto-regressive TTS model, which leverages global and local hierarchical architecture to effectively capture dependencies across multi-layer tokens. We extend LLM-based TTS from 3-layer RVQ at 50Hz to 32-layer RVQ at 5Hz. Experimental results demonstrate that U-Codec improves LLM-based TTS inference speed by around 3 $\\times$ over high-frame-rate codecs while maintaining similarity and naturalness. These results validate the feasibility of using highly compressed 5Hz discrete tokens for fast and high-fidelity speech synthesis.","short_abstract":"We propose \\textbf{U-Codec}, an \\textbf{U}ltra low frame-rate neural speech \\textbf{Codec} that achieves high-fidelity reconstruction and fast speech generation at an extremely low frame-rate of 5Hz (5 frames per second). Extreme compression at 5Hz typically leads to severe intelligibility and spectral detail loss, we...","url_abs":"https://arxiv.org/abs/2510.16718","url_pdf":"https://arxiv.org/pdf/2510.16718v1","authors":"[\"Xusheng Yang\",\"Long Zhou\",\"Wenfu Wang\",\"Kai Hu\",\"Shulin Feng\",\"Chenxing Li\",\"Meng Yu\",\"Dong Yu\",\"Yuexian Zou\"]","published":"2025-10-19T05:09:20Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Transformer\",\"Large Language Model\",\"Language Model\"]","has_code":false}
