{"ID":2894668,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.09929","arxiv_id":"2507.09929","title":"Aligning Generative Speech Enhancement with Perceptual Feedback","abstract":"Language Model (LM)-based speech enhancement (SE) has recently emerged as a promising direction, but existing approaches predominantly rely on token-level likelihood objectives that weakly reflect human perception. This mismatch limits progress, as optimizing signal accuracy does not always improve naturalness or listening comfort. We address this gap by introducing a perceptually aligned LM-based SE approach. Our method applies Direct Preference Optimization (DPO) with UTMOS, a neural MOS predictor, as a proxy for human ratings, directly steering models toward perceptually preferred outputs. This design directly connects model training to perceptual quality and is broadly applicable within LM-based SE frameworks. On the Deep Noise Suppression Challenge 2020 test sets, our approach consistently improves speech quality metrics, achieving relative gains of up to 56%. To our knowledge, this is the first integration of perceptual feedback into LM-based SE and the first application of DPO in the SE domain, establishing a new paradigm for perceptually aligned enhancement with SE.","short_abstract":"Language Model (LM)-based speech enhancement (SE) has recently emerged as a promising direction, but existing approaches predominantly rely on token-level likelihood objectives that weakly reflect human perception. This mismatch limits progress, as optimizing signal accuracy does not always improve naturalness or liste...","url_abs":"https://arxiv.org/abs/2507.09929","url_pdf":"https://arxiv.org/pdf/2507.09929v2","authors":"[\"Haoyang Li\",\"Nana Hou\",\"Yuchen Hu\",\"Jixun Yao\",\"Sabato Marco Siniscalchi\",\"Xuyi Zhuang\",\"Deheng Ye\",\"Wei Yang\",\"Eng Siong Chng\"]","published":"2025-07-14T05:15:39Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false}
