{"ID":2865100,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.21968","arxiv_id":"2509.21968","title":"AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook","abstract":"We propose AUV, a unified neural audio codec with a single codebook, which enables a favourable reconstruction of speech and further extends to general audio, including vocal, music, and sound. AUV is capable of tackling any 16 kHz mixed-domain audio segment at bit rates around 700 bps. To accomplish this, we guide the matryoshka codebook with nested domain-specific partitions, assigned with corresponding teacher models to perform distillation, all in a single-stage training. A conformer-style encoder-decoder architecture with STFT features as audio representation is employed, yielding better audio quality. Comprehensive evaluations demonstrate that AUV exhibits comparable audio reconstruction ability to state-of-the-art domain-specific single-layer quantizer codecs, showcasing the potential of audio universal vector quantization with a single codebook. The pre-trained model and demo samples are available at https://swivid.github.io/AUV/.","short_abstract":"We propose AUV, a unified neural audio codec with a single codebook, which enables a favourable reconstruction of speech and further extends to general audio, including vocal, music, and sound. AUV is capable of tackling any 16 kHz mixed-domain audio segment at bit rates around 700 bps. To accomplish this, we guide the...","url_abs":"https://arxiv.org/abs/2509.21968","url_pdf":"https://arxiv.org/pdf/2509.21968v1","authors":"[\"Yushen Chen\",\"Kai Hu\",\"Long Zhou\",\"Shulin Feng\",\"Xusheng Yang\",\"Hangting Chen\",\"Xie Chen\"]","published":"2025-09-26T06:54:46Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[]","has_code":false}
