{"ID":2889856,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.20128","arxiv_id":"2507.20128","title":"Diffusion-based Symbolic Music Generation with Structured State Space Models","abstract":"Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propose Symbolic Music Diffusion with Mamba (SMDIM), a novel diffusion-based architecture integrating Structured State Space Models (SSMs) for efficient global context modeling and the Mamba-FeedForward-Attention Block (MFA) for precise local detail preservation. The MFA Block combines the linear complexity of Mamba layers, the non-linear refinement of FeedForward layers, and the fine-grained precision of self-attention mechanisms, achieving a balance between scalability and musical expressiveness. SMDIM achieves near-linear complexity, making it highly efficient for long-sequence tasks. Evaluated on diverse datasets, including FolkDB, a collection of traditional Chinese folk music that represents an underexplored domain in symbolic music generation, SMDIM outperforms state-of-the-art models in both generation quality and computational efficiency. Beyond symbolic music, SMDIM's architectural design demonstrates adaptability to a broad range of long-sequence generation tasks, offering a scalable and efficient solution for coherent sequence modeling.","short_abstract":"Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propos...","url_abs":"https://arxiv.org/abs/2507.20128","url_pdf":"https://arxiv.org/pdf/2507.20128v2","authors":"[\"Shenghua Yuan\",\"Xing Tang\",\"Jiatao Chen\",\"Tianming Xie\",\"Jing Wang\",\"Bing Shi\"]","published":"2025-07-27T04:53:45Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Diffusion Model\",\"Transformer\"]","has_code":false}
