{"ID":2869245,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14785","arxiv_id":"2509.14785","title":"Spatial-CLAP: Learning Spatially-Aware audio--text Embeddings for Multi-Source Conditions","abstract":"Contrastive language--audio pretraining (CLAP) has achieved remarkable success as an audio--text embedding framework, but existing approaches are limited to monaural or single-source conditions and cannot fully capture spatial information. The central challenge in modeling spatial information lies in multi-source conditions, where the correct correspondence between each sound source and its location is required. To tackle this problem, we propose Spatial-CLAP, which introduces a content-aware spatial encoder that enables spatial representations coupled with audio content. We further propose spatial contrastive learning (SCL), a training strategy that explicitly enforces the learning of the correct correspondence and promotes more reliable embeddings under multi-source conditions. Experimental evaluations, including downstream tasks, demonstrate that Spatial-CLAP learns effective embeddings even under multi-source conditions, and confirm the effectiveness of SCL. Moreover, evaluation on unseen three-source mixtures highlights the fundamental distinction between conventional single-source training and our proposed multi-source training paradigm. These findings establish a new paradigm for spatially-aware audio--text embeddings.","short_abstract":"Contrastive language--audio pretraining (CLAP) has achieved remarkable success as an audio--text embedding framework, but existing approaches are limited to monaural or single-source conditions and cannot fully capture spatial information. The central challenge in modeling spatial information lies in multi-source condi...","url_abs":"https://arxiv.org/abs/2509.14785","url_pdf":"https://arxiv.org/pdf/2509.14785v1","authors":"[\"Kentaro Seki\",\"Yuki Okamoto\",\"Kouei Yamaoka\",\"Yuki Saito\",\"Shinnosuke Takamichi\",\"Hiroshi Saruwatari\"]","published":"2025-09-18T09:35:46Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[]","has_code":false}
