{"ID":2862139,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00981","arxiv_id":"2510.00981","title":"FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates","abstract":"Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-rate audio codecs, but even lower frame rate codecs remain underexplored. We find that a major challenge for very low frame rate tokens is missing semantic information. This paper introduces FlexiCodec to address this limitation. FlexiCodec improves semantic preservation with a dynamic frame rate approach and introduces a novel architecture featuring an ASR feature-assisted dual stream encoding and Transformer bottlenecks. With dynamic frame rates, it uses less frames at information-sparse regions through adaptively merging semantically similar frames. A dynamic frame rate also allows FlexiCodec to support inference-time controllable frame rates between 3Hz and 12.5Hz. Experiments on 6.25Hz, 8.3Hz and 12.5Hz average frame rates confirm that FlexiCodec excels over baseline systems in semantic information preservation and delivers a high audio reconstruction quality. We also validate the effectiveness of FlexiCodec in language model-based TTS. Demos are available at: https://flexicodec.github.io. Code is available at: https://github.com/amphionteam/flexicodec.","short_abstract":"Neural audio codecs are foundational to speech language models. It is expected to have a low frame rate and decoupled semantic and acoustic information. A lower frame rate codec can reduce the computational cost of speech language models by shortening the sequence length. Recent studies have developed 12.5Hz low-frame-...","url_abs":"https://arxiv.org/abs/2510.00981","url_pdf":"https://arxiv.org/pdf/2510.00981v3","authors":"[\"Jiaqi Li\",\"Yao Qian\",\"Yuxuan Hu\",\"Leying Zhang\",\"Xiaofei Wang\",\"Heng Lu\",\"Manthan Thakker\",\"Jinyu Li\",\"Sheng Zhao\",\"Zhizheng Wu\"]","published":"2025-10-01T14:56:18Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608876,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862139,"paper_url":"https://arxiv.org/abs/2510.00981","paper_title":"FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates","repo_url":"https://github.com/amphionteam/flexicodec","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
