{"ID":2869684,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14298","arxiv_id":"2509.14298","title":"SpeechOp: Inference-Time Task Composition for Generative Speech Processing","abstract":"While generative Text-to-Speech (TTS) systems leverage vast ``in-the-wild\" data to achieve remarkable success, speech-to-speech processing tasks like enhancement face data limitations, which lead data-hungry generative approaches to distort speech content and speaker identity. To bridge this gap, we present SpeechOp, a multi-task latent diffusion model that transforms pre-trained TTS models into a universal speech processor capable of performing a wide range of speech tasks and composing them in novel ways at inference time. By adapting a pre-trained TTS model, SpeechOp inherits a rich understanding of natural speech, accelerating training and improving S2S task quality, while simultaneously enhancing core TTS performance. Finally, we introduce Implicit Task Composition (ITC), a novel pipeline where ASR-derived transcripts (e.g., from Whisper) guide SpeechOp's enhancement via our principled inference-time task composition. ITC achieves state-of-the-art content preservation by robustly combining web-scale speech understanding with SpeechOp's generative capabilities. Audio samples are available at https://justinlovelace.github.io/projects/speechop","short_abstract":"While generative Text-to-Speech (TTS) systems leverage vast ``in-the-wild\" data to achieve remarkable success, speech-to-speech processing tasks like enhancement face data limitations, which lead data-hungry generative approaches to distort speech content and speaker identity. To bridge this gap, we present SpeechOp, a...","url_abs":"https://arxiv.org/abs/2509.14298","url_pdf":"https://arxiv.org/pdf/2509.14298v1","authors":"[\"Justin Lovelace\",\"Rithesh Kumar\",\"Jiaqi Su\",\"Ke Chen\",\"Kilian Q Weinberger\",\"Zeyu Jin\"]","published":"2025-09-17T05:05:55Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.LG\"]","methods":"[\"Diffusion Model\"]","has_code":false}
