{"ID":2848775,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25955","arxiv_id":"2510.25955","title":"SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations","abstract":"Self-supervised learning (SSL) has significantly advanced acoustic representation learning. However, most existing models are optimised for either speech or audio event understanding, resulting in a persistent gap between these two domains. We address this gap with SPEAR (SPEech and Audio Representations), a self-supervised framework that distils complementary knowledge from a speech-focused SSL teacher and a general-audio SSL teacher into a single unified model. SPEAR applies multi-codebook vector quantisation to continuous teacher representations to produce fine-grained discrete tokens that capture both semantic and acoustic information. To effectively integrate these heterogeneous representations, SPEAR jointly predicts them given a masked input with an asymmetric pre-training loss. We further improve robustness in complex sound scenes through a novel token mixing mechanism. Extensive experiments demonstrate that SPEAR consistently outperforms existing unified speech and audio models. SPEAR establishes a new state-of-the-art on the SUPERB benchmark, surpassing WavLM Large on 12 of 15 tasks, while achieving competitive performance on the HEAR benchmark. These results position SPEAR as a versatile foundation for general-purpose speech and audio representation learning. The code and pre-trained models will be released.","short_abstract":"Self-supervised learning (SSL) has significantly advanced acoustic representation learning. However, most existing models are optimised for either speech or audio event understanding, resulting in a persistent gap between these two domains. We address this gap with SPEAR (SPEech and Audio Representations), a self-super...","url_abs":"https://arxiv.org/abs/2510.25955","url_pdf":"https://arxiv.org/pdf/2510.25955v3","authors":"[\"Xiaoyu Yang\",\"Yifan Yang\",\"Zengrui Jin\",\"Ziyun Cui\",\"Wen Wu\",\"Baoxiang Li\",\"Chao Zhang\",\"Phil Woodland\"]","published":"2025-10-29T20:53:12Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false}
