{"ID":2870970,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12003","arxiv_id":"2509.12003","title":"Improving Out-of-Domain Audio Deepfake Detection via Layer Selection and Fusion of SSL-Based Countermeasures","abstract":"Audio deepfake detection systems based on frozen pre-trained self-supervised learning (SSL) encoders show a high level of performance when combined with layer-weighted pooling methods, such as multi-head factorized attentive pooling (MHFA). However, they still struggle to generalize to out-of-domain (OOD) conditions. We tackle this problem by studying the behavior of six different pre-trained SSLs, on four different test corpora. We perform a layer-by-layer analysis to determine which layers contribute most. Next, we study the pooling head, comparing a strategy based on a single layer with automatic selection via MHFA. We observed that selecting the best layer gave very good results, while reducing system parameters by up to 80%. A wide variation in performance as a function of test corpus and SSL model is also observed, showing that the pre-training strategy of the encoder plays a role. Finally, score-level fusion of several encoders improved generalization to OOD attacks.","short_abstract":"Audio deepfake detection systems based on frozen pre-trained self-supervised learning (SSL) encoders show a high level of performance when combined with layer-weighted pooling methods, such as multi-head factorized attentive pooling (MHFA). However, they still struggle to generalize to out-of-domain (OOD) conditions. W...","url_abs":"https://arxiv.org/abs/2509.12003","url_pdf":"https://arxiv.org/pdf/2509.12003v1","authors":"[\"Pierre Serrano\",\"Raphaël Duroselle\",\"Florian Angulo\",\"Jean-François Bonastre\",\"Olivier Boeffard\"]","published":"2025-09-15T14:50:21Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.LG\"]","methods":"[]","has_code":false}
