{"ID":2826468,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.18572","arxiv_id":"2512.18572","title":"MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean Flow","abstract":"Target speaker extraction (TSE) aims to isolate a desired speaker's voice from a multi-speaker mixture using auxiliary information such as a reference utterance. Although recent advances in diffusion and flow-matching models have improved TSE performance, these methods typically require multi-step sampling, which limits their practicality in low-latency settings. In this work, we propose MeanFlow-TSE, a one-step generative TSE framework trained with mean-flow objectives, enabling fast and high-quality generation without iterative refinement. Building on the AD-FlowTSE paradigm, our method defines a flow between the background and target source that is governed by the mixing ratio (MR). Experiments on the Libri2Mix corpus show that our approach outperforms existing diffusion- and flow-matching-based TSE models in separation quality and perceptual metrics while requiring only a single inference step. These results demonstrate that mean-flow-guided one-step generation offers an effective and efficient alternative for real-time target speaker extraction. Code is available at https://github.com/rikishimizu/MeanFlow-TSE.","short_abstract":"Target speaker extraction (TSE) aims to isolate a desired speaker's voice from a multi-speaker mixture using auxiliary information such as a reference utterance. Although recent advances in diffusion and flow-matching models have improved TSE performance, these methods typically require multi-step sampling, which limit...","url_abs":"https://arxiv.org/abs/2512.18572","url_pdf":"https://arxiv.org/pdf/2512.18572v1","authors":"[\"Riki Shimizu\",\"Xilin Jiang\",\"Nima Mesgarani\"]","published":"2025-12-21T02:50:36Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":605745,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2826468,"paper_url":"https://arxiv.org/abs/2512.18572","paper_title":"MeanFlow-TSE: One-Step Generative Target Speaker Extraction with Mean Flow","repo_url":"https://github.com/rikishimizu/MeanFlow-TSE","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}