{"ID":2853834,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15227","arxiv_id":"2510.15227","title":"LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models","abstract":"This paper presents LongCat-Audio-Codec, an audio tokenizer and detokenizer solution designed for industrial grade end-to-end speech large language models. By leveraging a decoupled model architecture and a multistage training strategy, LongCat-Audio-Codec exhibits robust semantic modeling capabilities, flexible acoustic feature extraction capabilities, and low-latency streaming synthesis capabilities. It encodes speech at an ultra-low frame rate of 16.67 Hz, with a minimum bitrate of 0.43 kbps and a maximum bitrate of 0.87 kbps. Evaluation results demonstrate that LongCat-Audio-Codec achieves strong speech intelligibility and is capable of synthesizing highquality speech at low bitrate, thus effectively balancing coding efficiency and decoding quality. The inference code and model checkpoints of LongCat-Audio-Codec are available at: https://github.com/meituan-longcat/LongCat-Audio-Codec.","short_abstract":"This paper presents LongCat-Audio-Codec, an audio tokenizer and detokenizer solution designed for industrial grade end-to-end speech large language models. By leveraging a decoupled model architecture and a multistage training strategy, LongCat-Audio-Codec exhibits robust semantic modeling capabilities, flexible acoust...","url_abs":"https://arxiv.org/abs/2510.15227","url_pdf":"https://arxiv.org/pdf/2510.15227v1","authors":"[\"Xiaohan Zhao\",\"Hongyu Xiang\",\"Shengze Ye\",\"Song Li\",\"Zhengkun Tian\",\"Guanyu Chen\",\"Ke Ding\",\"Guanglu Wan\"]","published":"2025-10-17T01:33:57Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":608088,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2853834,"paper_url":"https://arxiv.org/abs/2510.15227","paper_title":"LongCat-Audio-Codec: An Audio Tokenizer and Detokenizer Solution Designed for Speech Large Language Models","repo_url":"https://github.com/meituan-longcat/LongCat-Audio-Codec","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
