{"ID":2868242,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.17219","arxiv_id":"2509.17219","title":"Virtual Consistency for Audio Editing","abstract":"Free-form, text-based audio editing remains a persistent challenge, despite progress in inversion-based neural methods. Current approaches rely on slow inversion procedures, limiting their practicality. We present a virtual-consistency based audio editing system that bypasses inversion by adapting the sampling process of diffusion models. Our pipeline is model-agnostic, requiring no fine-tuning or architectural changes, and achieves substantial speed-ups over recent neural editing baselines. Crucially, it achieves this efficiency without compromising quality, as demonstrated by quantitative benchmarks and a user study involving 16 participants.","short_abstract":"Free-form, text-based audio editing remains a persistent challenge, despite progress in inversion-based neural methods. Current approaches rely on slow inversion procedures, limiting their practicality. We present a virtual-consistency based audio editing system that bypasses inversion by adapting the sampling process...","url_abs":"https://arxiv.org/abs/2509.17219","url_pdf":"https://arxiv.org/pdf/2509.17219v1","authors":"[\"Matthieu Cervera\",\"Francesco Paissan\",\"Mirco Ravanelli\",\"Cem Subakan\"]","published":"2025-09-21T19:54:20Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
