{"ID":2828286,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.22156","arxiv_id":"2512.22156","title":"A Robust framework for sound event localization and detection on real recordings","abstract":"This technical report describes the systems submitted to the DCASE2022 challenge task 3: sound event localization and detection (SELD). The task aims to detect occurrences of sound events and specify their class, furthermore estimate their position. Our system utilizes a ResNet-based model under a proposed robust framework for SELD. To guarantee the generalized performance on the real-world sound scenes, we design the total framework with augmentation techniques, a pipeline of mixing datasets from real-world sound scenes and emulations, and test time augmentation. Augmentation techniques and exploitation of external sound sources enable training diverse samples and keeping the opportunity to train the real-world context enough by maintaining the number of the real recording samples in the batch. In addition, we design a test time augmentation and a clustering-based model ensemble method to aggregate confident predictions. Experimental results show that the model under a proposed framework outperforms the baseline methods and achieves competitive performance in real-world sound recordings.","short_abstract":"This technical report describes the systems submitted to the DCASE2022 challenge task 3: sound event localization and detection (SELD). The task aims to detect occurrences of sound events and specify their class, furthermore estimate their position. Our system utilizes a ResNet-based model under a proposed robust frame...","url_abs":"https://arxiv.org/abs/2512.22156","url_pdf":"https://arxiv.org/pdf/2512.22156v1","authors":"[\"Jin Sob Kim\",\"Hyun Joon Park\",\"Wooseok Shin\",\"Sung Won Han\"]","published":"2025-12-16T01:13:13Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[]","has_code":false}