{"ID":2885125,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.05207","arxiv_id":"2508.05207","title":"SpectroStream: A Versatile Neural Codec for General Audio","abstract":"We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural architecture that leverages audio representation in the time-frequency domain, which leads to better audio quality especially at higher sample rate. The model also uses a delayed-fusion strategy to handle multi-channel audio, which is crucial in balancing per-channel acoustic quality and cross-channel phase consistency.","short_abstract":"We propose SpectroStream, a full-band multi-channel neural audio codec. Successor to the well-established SoundStream, SpectroStream extends its capability beyond 24 kHz monophonic audio and enables high-quality reconstruction of 48 kHz stereo music at bit rates of 4--16 kbps. This is accomplished with a new neural arc...","url_abs":"https://arxiv.org/abs/2508.05207","url_pdf":"https://arxiv.org/pdf/2508.05207v1","authors":"[\"Yunpeng Li\",\"Kehang Han\",\"Brian McWilliams\",\"Zalan Borsos\",\"Marco Tagliasacchi\"]","published":"2025-08-07T09:44:00Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"eess.AS\"]","methods":"[]","has_code":false}
