Advancing Zero-Shot Open-Set Speech Deepfake Source Tracing
Abstract
We propose a novel zero-shot source tracing framework inspired by speaker verification. We adapt SSL-AASIST for attack classification, enhancing embeddings with AAM loss and RegMixup, and ensure that training attacks are disjoint from those forming fingerprint-trial pairs. For backend scoring in attack verification, we explore both zero-shot approaches (cosine similarity and Siamese) and few-shot approaches (MLP and Siamese). Experiments on our recently introduced STOPA dataset with an open set setting show that few-shot learning provides advantages in the in-distribution (ID) scenario, while zero-shot approaches perform better in the out-of-distribution (OOD) scenario. In attack source verification with ID trials, few-shot Siamese and MLP achieve equal error rates (EER) of 17.72% and 13.11%, compared to 29.91% for zero-shot cosine scoring. Conversely, in OOD trials, zero-shot cosine scoring reaches 16.43%, outperforming few-shot Siamese at 23.47% and MLP at 21.57%.