{"ID":2845112,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03942","arxiv_id":"2511.03942","title":"MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation","abstract":"We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM library for accelerated inference. Experiments show that MIDI-LLM achieves higher quality, better text control, and faster inference compared to the recent Text2midi model. Live demo at https://midi-llm-demo.vercel.app.","short_abstract":"We present MIDI-LLM, an LLM for generating multitrack MIDI music from free-form text prompts. Our approach expands a text LLM's vocabulary to include MIDI tokens, and uses a two-stage training recipe to endow text-to-MIDI abilities. By preserving the original LLM's parameter structure, we can directly leverage the vLLM...","url_abs":"https://arxiv.org/abs/2511.03942","url_pdf":"https://arxiv.org/pdf/2511.03942v1","authors":"[\"Shih-Lun Wu\",\"Yoon Kim\",\"Cheng-Zhi Anna Huang\"]","published":"2025-11-06T00:40:07Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.CL\",\"cs.MM\"]","methods":"[\"Large Language Model\",\"Language Model\"]","project_urls":"[\"https://midi-llm-demo.vercel.app\"]","has_code":false}
