{"ID":2845463,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04623","arxiv_id":"2511.04623","title":"PromptSep: Generative Audio Separation via Multimodal Prompting","abstract":"Recent breakthroughs in language-queried audio source separation (LASS) have shown that generative models can achieve higher separation audio quality than traditional masking-based approaches. However, two key limitations restrict their practical use: (1) users often require operations beyond separation, such as sound removal; and (2) relying solely on text prompts can be unintuitive for specifying sound sources. In this paper, we propose PromptSep to extend LASS into a broader framework for general-purpose sound separation. PromptSep leverages a conditional diffusion model enhanced with elaborated data simulation to enable both audio extraction and sound removal. To move beyond text-only queries, we incorporate vocal imitation as an additional and more intuitive conditioning modality for our model, by incorporating Sketch2Sound as a data augmentation strategy. Both objective and subjective evaluations on multiple benchmarks demonstrate that PromptSep achieves state-of-the-art performance in sound removal and vocal-imitation-guided source separation, while maintaining competitive results on language-queried source separation.","short_abstract":"Recent breakthroughs in language-queried audio source separation (LASS) have shown that generative models can achieve higher separation audio quality than traditional masking-based approaches. However, two key limitations restrict their practical use: (1) users often require operations beyond separation, such as sound...","url_abs":"https://arxiv.org/abs/2511.04623","url_pdf":"https://arxiv.org/pdf/2511.04623v1","authors":"[\"Yutong Wen\",\"Ke Chen\",\"Prem Seetharaman\",\"Oriol Nieto\",\"Jiaqi Su\",\"Rithesh Kumar\",\"Minje Kim\",\"Paris Smaragdis\",\"Zeyu Jin\",\"Justin Salamon\"]","published":"2025-11-06T18:15:56Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[\"Diffusion Model\"]","has_code":false}
