{"ID":2825850,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.20369","arxiv_id":"2512.20369","title":"EnvSSLAM-FFN: Lightweight Layer-Fused System for ESDD 2026 Challenge","abstract":"Recent advances in generative audio models have enabled high-fidelity environmental sound synthesis, raising serious concerns for audio security. The ESDD 2026 Challenge therefore addresses environmental sound deepfake detection under unseen generators (Track 1) and black-box low-resource detection (Track 2) conditions. We propose EnvSSLAM-FFN, which integrates a frozen SSLAM self-supervised encoder with a lightweight FFN back-end. To effectively capture spoofing artifacts under severe data imbalance, we fuse intermediate SSLAM representations from layers 4-9 and adopt a class-weighted training objective. Experimental results show that the proposed system consistently outperforms the official baselines on both tracks, achieving Test Equal Error Rates (EERs) of 1.20% and 1.05%, respectively.","short_abstract":"Recent advances in generative audio models have enabled high-fidelity environmental sound synthesis, raising serious concerns for audio security. The ESDD 2026 Challenge therefore addresses environmental sound deepfake detection under unseen generators (Track 1) and black-box low-resource detection (Track 2) conditions...","url_abs":"https://arxiv.org/abs/2512.20369","url_pdf":"https://arxiv.org/pdf/2512.20369v1","authors":"[\"Xiaoxuan Guo\",\"Hengyan Huang\",\"Jiayi Zhou\",\"Renhe Sun\",\"Jian Liu\",\"Haonan Cheng\",\"Long Ye\",\"Qin Zhang\"]","published":"2025-12-23T13:54:02Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false}
