{"ID":2861579,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02187","arxiv_id":"2510.02187","title":"High-Fidelity Speech Enhancement via Discrete Audio Tokens","abstract":"Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-specific speech enhancement. In this work, we introduce DAC-SE1, a simplified language model-based SE framework leveraging discrete high-resolution audio representations; DAC-SE1 preserves fine-grained acoustic details while maintaining semantic coherence. Our experiments show that DAC-SE1 surpasses state-of-the-art autoregressive SE methods on both objective perceptual metrics and in a MUSHRA human evaluation. We release our codebase and model checkpoints to support further research in scalable, unified, and high-quality speech enhancement.","short_abstract":"Recent autoregressive transformer-based speech enhancement (SE) methods have shown promising results by leveraging advanced semantic understanding and contextual modeling of speech. However, these approaches often rely on complex multi-stage pipelines and low sampling rate codecs, limiting them to narrow and task-speci...","url_abs":"https://arxiv.org/abs/2510.02187","url_pdf":"https://arxiv.org/pdf/2510.02187v1","authors":"[\"Luca A. Lanzendörfer\",\"Frédéric Berdoz\",\"Antonis Asonitis\",\"Roger Wattenhofer\"]","published":"2025-10-02T16:38:05Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.LG\",\"eess.AS\"]","methods":"[\"Transformer\",\"Language Model\"]","has_code":false}