{"ID":2832395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.06041","arxiv_id":"2512.06041","title":"Technical Report of Nomi Team in the Environmental Sound Deepfake Detection Challenge 2026","abstract":"This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by proposing an audio-text cross-attention model. Experiments with individual and combined text-audio models demonstrate competitive EER improvements over the challenge baseline (BEATs+AASIST model).","short_abstract":"This paper presents our work for the ICASSP 2026 Environmental Sound Deepfake Detection (ESDD) Challenge. The challenge is based on the large-scale EnvSDD dataset that consists of various synthetic environmental sounds. We focus on addressing the complexities of unseen generators and low-resource black-box scenarios by...","url_abs":"https://arxiv.org/abs/2512.06041","url_pdf":"https://arxiv.org/pdf/2512.06041v1","authors":"[\"Candy Olivia Mawalim\",\"Haotian Zhang\",\"Shogo Okada\"]","published":"2025-12-05T03:37:18Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false}