{"ID":2860837,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02797","arxiv_id":"2510.02797","title":"SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision","abstract":"Music structure analysis (MSA) underpins music understanding and controllable generation, yet progress has been limited by small, inconsistent corpora. We present SongFormer, a scalable framework that learns from heterogeneous supervision. SongFormer (i) fuses short- and long-window self-supervised learning representations to capture both fine-grained and long-range dependencies, and (ii) introduces a learned source embedding to enable training with partial, noisy, and schema-mismatched labels. To support scaling and fair evaluation, we release SongFormDB, the largest MSA corpus to date (over 14k songs spanning languages and genres), and SongFormBench, a 300-song expert-verified benchmark. On SongFormBench, SongFormer sets a new state of the art in strict boundary detection (HR.5F) and achieves the highest functional label accuracy, while remaining computationally efficient; it surpasses strong baselines and Gemini 2.5 Pro on these metrics and remains competitive under relaxed tolerance (HR3F). Code, datasets, and model are open-sourced at https://github.com/ASLP-lab/SongFormer.","short_abstract":"Music structure analysis (MSA) underpins music understanding and controllable generation, yet progress has been limited by small, inconsistent corpora. We present SongFormer, a scalable framework that learns from heterogeneous supervision. SongFormer (i) fuses short- and long-window self-supervised learning representat...","url_abs":"https://arxiv.org/abs/2510.02797","url_pdf":"https://arxiv.org/pdf/2510.02797v3","authors":"[\"Chunbo Hao\",\"Ruibin Yuan\",\"Jixun Yao\",\"Qixin Deng\",\"Xinyi Bai\",\"Yanbo Wang\",\"Wei Xue\",\"Lei Xie\"]","published":"2025-10-03T08:10:19Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[]","has_code":false,"code_links":[{"ID":608762,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2860837,"paper_url":"https://arxiv.org/abs/2510.02797","paper_title":"SongFormer: Scaling Music Structure Analysis with Heterogeneous Supervision","repo_url":"https://github.com/ASLP-lab/SongFormer","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}