{"ID":2840481,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13146","arxiv_id":"2511.13146","title":"Towards Practical Real-Time Low-Latency Music Source Separation","abstract":"In recent years, significant progress has been made in the field of deep learning for music demixing. However, there has been limited attention on real-time, low-latency music demixing, which holds potential for various applications, such as hearing aids, audio stream remixing, and live performances. Additionally, a notable tendency has emerged towards the development of larger models, limiting their applicability in certain scenarios. In this paper, we introduce a lightweight real-time low-latency model called Real-Time Single-Path TFC-TDF UNET (RT-STT), which is based on the Dual-Path TFC-TDF UNET (DTTNet). In RT-STT, we propose a feature fusion technique based on channel expansion. We also demonstrate the superiority of single-path modeling over dual-path modeling in real-time models. Moreover, we investigate the method of quantization to further reduce inference time. RT-STT exhibits superior performance with significantly fewer parameters and shorter inference times compared to state-of-the-art models.","short_abstract":"In recent years, significant progress has been made in the field of deep learning for music demixing. However, there has been limited attention on real-time, low-latency music demixing, which holds potential for various applications, such as hearing aids, audio stream remixing, and live performances. Additionally, a no...","url_abs":"https://arxiv.org/abs/2511.13146","url_pdf":"https://arxiv.org/pdf/2511.13146v1","authors":"[\"Junyu Wu\",\"Jie Liu\",\"Tianrui Pan\",\"Jie Tang\",\"Gangshan Wu\"]","published":"2025-11-17T08:56:14Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.MM\"]","methods":"[]","has_code":false}