{"ID":2857942,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.07840","arxiv_id":"2510.07840","title":"ACMID: Automatic Curation of Musical Instrument Dataset for 7-Stem Music Source Separation","abstract":"Most current music source separation (MSS) methods rely on supervised learning, limited by training data quantity and quality. Though web-crawling can bring abundant data, platform-level track labeling often causes metadata mismatches, impeding accurate \"audio-label\" pair acquisition. To address this, we present ACMID: a dataset for MSS generated through web crawling of extensive raw data, followed by automatic cleaning via an instrument classifier built on a pre-trained audio encoder that filters and aggregates clean segments of target instruments from the crawled tracks, resulting in the refined ACMID-Cleaned dataset. Leveraging abundant data, we expand the conventional classification from 4-stem (Vocal/Bass/Drums/Others) to 7-stem (Piano/Drums/Bass/Acoustic Guitar/Electric Guitar/Strings/Wind-Brass), enabling high granularity MSS systems. Experiments on SOTA MSS model demonstrates two key results: (i) MSS model trained with ACMID-Cleaned achieved a 2.39dB improvement in SDR performance compared to that with ACMID-Uncleaned, demostrating the effectiveness of our data cleaning procedure; (ii) incorporating ACMID-Cleaned to training enhances MSS model's average performance by 1.16dB, confirming the value of our dataset. Our data crawling code, cleaning model code and weights are available at: https://github.com/scottishfold0621/ACMID.","short_abstract":"Most current music source separation (MSS) methods rely on supervised learning, limited by training data quantity and quality. Though web-crawling can bring abundant data, platform-level track labeling often causes metadata mismatches, impeding accurate \"audio-label\" pair acquisition. To address this, we present ACMID:...","url_abs":"https://arxiv.org/abs/2510.07840","url_pdf":"https://arxiv.org/pdf/2510.07840v1","authors":"[\"Ji Yu\",\"Yang shuo\",\"Xu Yuetonghui\",\"Liu Mengmei\",\"Ji Qiang\",\"Han Zerui\"]","published":"2025-10-09T06:32:04Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false,"code_links":[{"ID":608503,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2857942,"paper_url":"https://arxiv.org/abs/2510.07840","paper_title":"ACMID: Automatic Curation of Musical Instrument Dataset for 7-Stem Music Source Separation","repo_url":"https://github.com/scottishfold0621/ACMID","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
