{"ID":3083902,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T06:54:00.442624098Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.05892","arxiv_id":"2606.05892","title":"VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization","abstract":"Neural speech codecs are key to speech transmission and storage, but most use uniform quantization across frames, allocating the same bitrate regardless of content and wasting bits. We propose VoCodec, a low-bitrate streamable neural speech codec with voicing-driven quantization that assigns higher bitrate to voiced frames and lower bitrate to unvoiced frames according to perceptual sensitivity. VoCodec embeds a voicing detector in a fully causal encoder-quantizer-decoder neural coding framework, using residual scalar-vector quantization for voiced frames and simple scalar quantization for unvoiced ones. Experiments show that on the LibriTTS dataset at a 16 kHz sampling rate, VoCodec outperforms baseline neural speech codecs even at a bitrate as low as 1.1 kbps. Our further experiments also confirm that introducing voicing-driven quantization can effectively reduce the bitrate by approximately 27% compared with uniform quantization strategy.","short_abstract":"Neural speech codecs are key to speech transmission and storage, but most use uniform quantization across frames, allocating the same bitrate regardless of content and wasting bits. We propose VoCodec, a low-bitrate streamable neural speech codec with voicing-driven quantization that assigns higher bitrate to voiced fr...","url_abs":"https://arxiv.org/abs/2606.05892","url_pdf":"https://arxiv.org/pdf/2606.05892v1","authors":"[\"Xiao-Hang Jiang\",\"Yang Ai\",\"Rui-Chen Zheng\",\"Li-Rong Dai\",\"Zhen-Hua Ling\",\"Ji Wu\"]","published":"2026-06-04T09:00:13Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
