{"ID":2860072,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05305","arxiv_id":"2510.05305","title":"WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection","abstract":"Modern front-end design for speech deepfake detection relies on full fine-tuning of large pre-trained models like XLSR. However, this approach is not parameter-efficient and may lead to suboptimal generalization to realistic, in-the-wild data types. To address these limitations, we introduce a new family of parameter-efficient front-ends that fuse prompt-tuning with classical signal processing transforms. These include FourierPT-XLSR, which uses the Fourier Transform, and two variants based on the Wavelet Transform: WSPT-XLSR and Partial-WSPT-XLSR. We further propose WaveSP-Net, a novel architecture combining a Partial-WSPT-XLSR front-end and a bidirectional Mamba-based back-end. This design injects multi-resolution features into the prompt embeddings, which enhances the localization of subtle synthetic artifacts without altering the frozen XLSR parameters. Experimental results demonstrate that WaveSP-Net outperforms several state-of-the-art models on two new and challenging benchmarks, Deepfake-Eval-2024 and SpoofCeleb, with low trainable parameters and notable performance gains. The code and models are available at https://github.com/xxuan-acoustics/WaveSP-Net.","short_abstract":"Modern front-end design for speech deepfake detection relies on full fine-tuning of large pre-trained models like XLSR. However, this approach is not parameter-efficient and may lead to suboptimal generalization to realistic, in-the-wild data types. To address these limitations, we introduce a new family of parameter-e...","url_abs":"https://arxiv.org/abs/2510.05305","url_pdf":"https://arxiv.org/pdf/2510.05305v2","authors":"[\"Xi Xuan\",\"Xuechen Liu\",\"Wenxin Zhang\",\"Yi-Cheng Lin\",\"Xiaojian Lin\",\"Tomi Kinnunen\"]","published":"2025-10-06T19:17:18Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.CL\",\"eess.SP\"]","methods":"[]","has_code":false,"code_links":[{"ID":608697,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2860072,"paper_url":"https://arxiv.org/abs/2510.05305","paper_title":"WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection","repo_url":"https://github.com/xxuan-acoustics/WaveSP-Net","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
