{"ID":2836920,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.20224","arxiv_id":"2511.20224","title":"DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling","abstract":"Audio tokenization bridges continuous waveforms and multi-track music language models. In dual-track modeling, tokens should preserve three properties at once: high-fidelity reconstruction, strong predictability under a language model, and cross-track correspondence. We introduce DuoTok, a source-aware dual-track tokenizer that addresses this trade-off through staged disentanglement. DuoTok first pretrains a semantic encoder, then regularizes it with multi-task supervision, freezes the encoder, and applies hard dual-codebook routing while keeping auxiliary objectives on quantized codes. A diffusion decoder reconstructs high-frequency details, allowing tokens to focus on structured information for sequence modeling. On standard benchmarks, DuoTok achieves a favorable predictability-fidelity trade-off, reaching the lowest cnBPT while maintaining competitive reconstruction at 0.75 kbps. Under a held-constant dual-track language modeling protocol, enBPT also improves, indicating gains beyond codebook size effects. Controlled diagnostics show larger predictability costs under cross-track corruption and larger gains from longer context, suggesting that models trained on DuoTok tokens use cross-track structure and non-local history.","short_abstract":"Audio tokenization bridges continuous waveforms and multi-track music language models. In dual-track modeling, tokens should preserve three properties at once: high-fidelity reconstruction, strong predictability under a language model, and cross-track correspondence. We introduce DuoTok, a source-aware dual-track token...","url_abs":"https://arxiv.org/abs/2511.20224","url_pdf":"https://arxiv.org/pdf/2511.20224v2","authors":"[\"Rui Lin\",\"Zhiyue Wu\",\"Jiahe Le\",\"Kangdi Wang\",\"Weixiong Chen\",\"Junyu Dai\",\"Tao Jiang\"]","published":"2025-11-25T11:53:57Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\"]","methods":"[\"Diffusion Model\",\"Language Model\"]","has_code":false}
