{"ID":2881287,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.13320","arxiv_id":"2508.13320","title":"Rapidly Adapting to New Voice Spoofing: Few-Shot Detection of Synthesized Speech Under Distribution Shifts","abstract":"We address the challenge of detecting synthesized speech under distribution shifts -- arising from unseen synthesis methods, speakers, languages, or audio conditions -- relative to the training data. Few-shot learning methods are a promising way to tackle distribution shifts by rapidly adapting on the basis of a few in-distribution samples. We propose a self-attentive prototypical network to enable more robust few-shot adaptation. To evaluate our approach, we systematically compare the performance of traditional zero-shot detectors and the proposed few-shot detectors, carefully controlling training conditions to introduce distribution shifts at evaluation time. In conditions where distribution shifts hamper the zero-shot performance, our proposed few-shot adaptation technique can quickly adapt using as few as 10 in-distribution samples -- achieving upto 32% relative EER reduction on deepfakes in Japanese language and 20% relative reduction on ASVspoof 2021 Deepfake dataset.","short_abstract":"We address the challenge of detecting synthesized speech under distribution shifts -- arising from unseen synthesis methods, speakers, languages, or audio conditions -- relative to the training data. Few-shot learning methods are a promising way to tackle distribution shifts by rapidly adapting on the basis of a few in...","url_abs":"https://arxiv.org/abs/2508.13320","url_pdf":"https://arxiv.org/pdf/2508.13320v1","authors":"[\"Ashi Garg\",\"Zexin Cai\",\"Henry Li Xinyuan\",\"Leibny Paola García-Perera\",\"Kevin Duh\",\"Sanjeev Khudanpur\",\"Matthew Wiesner\",\"Nicholas Andrews\"]","published":"2025-08-18T19:14:45Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
