{"ID":2884980,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.04996","arxiv_id":"2508.04996","title":"REF-VC: Robust, Expressive and Fast Zero-Shot Voice Conversion with Diffusion Transformers","abstract":"In real-world voice conversion applications, environmental noise in source speech and user demands for expressive output pose critical challenges. Traditional ASR-based methods ensure noise robustness but suppress prosody richness, while SSL-based models improve expressiveness but suffer from timbre leakage and noise sensitivity. This paper proposes REF-VC, a noise-robust expressive voice conversion system. Key innovations include: (1) A random erasing strategy to mitigate the information redundancy inherent in SSL features, enhancing noise robustness and expressiveness; (2) Implicit alignment inspired by E2TTS to suppress non-essential feature reconstruction; (3) Integration of Shortcut Models to accelerate flow matching inference, significantly reducing to 4 steps. Experimental results demonstrate that REF-VC outperforms baselines such as Seed-VC in zero-shot scenarios on the noisy set, while also performing comparably to Seed-VC on the clean set. In addition, REF-VC can be compatible with singing voice conversion within one model.","short_abstract":"In real-world voice conversion applications, environmental noise in source speech and user demands for expressive output pose critical challenges. Traditional ASR-based methods ensure noise robustness but suppress prosody richness, while SSL-based models improve expressiveness but suffer from timbre leakage and noise s...","url_abs":"https://arxiv.org/abs/2508.04996","url_pdf":"https://arxiv.org/pdf/2508.04996v2","authors":"[\"Yuepeng Jiang\",\"Ziqian Ning\",\"Shuai Wang\",\"Chengjia Wang\",\"Mengxiao Bi\",\"Pengcheng Zhu\",\"Zhonghua Fu\",\"Lei Xie\"]","published":"2025-08-07T03:08:49Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
