{"ID":2863828,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25131","arxiv_id":"2509.25131","title":"MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech","abstract":"We present MGM-Omni, a unified Omni LLM for omni-modal understanding and expressive, long-horizon speech generation. Unlike cascaded pipelines that isolate speech synthesis, MGM-Omni adopts a \"brain-mouth\" design with a dual-track, token-based architecture that cleanly decouples multimodal reasoning from real-time speech generation. This design enables efficient cross-modal interaction and low-latency, streaming speech generation. For understanding, a unified training strategy coupled with a dual audio encoder design enables long-form audio perception across diverse acoustic conditions. For generation, a chunk-based parallel decoding scheme narrows the text speech token-rate gap, accelerating inference and supporting streaming zero-shot voice cloning with stable timbre over extended durations. Compared to concurrent work, MGM-Omni achieves these capabilities with markedly data-efficient training. Extensive experiments demonstrate that MGM-Omni outperforms existing open source models in preserving timbre identity across extended sequences, producing natural and context-aware speech, and achieving superior long-form audio and omnimodal understanding. MGM-Omni establishes an efficient, end-to-end paradigm for omnimodal understanding and controllable, personalised long-horizon speech generation.","short_abstract":"We present MGM-Omni, a unified Omni LLM for omni-modal understanding and expressive, long-horizon speech generation. Unlike cascaded pipelines that isolate speech synthesis, MGM-Omni adopts a \"brain-mouth\" design with a dual-track, token-based architecture that cleanly decouples multimodal reasoning from real-time spee...","url_abs":"https://arxiv.org/abs/2509.25131","url_pdf":"https://arxiv.org/pdf/2509.25131v1","authors":"[\"Chengyao Wang\",\"Zhisheng Zhong\",\"Bohao Peng\",\"Senqiao Yang\",\"Yuqi Liu\",\"Haokun Gui\",\"Bin Xia\",\"Jingyao Li\",\"Bei Yu\",\"Jiaya Jia\"]","published":"2025-09-29T17:48:28Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.CL\",\"cs.CV\",\"cs.MM\"]","methods":"[\"Large Language Model\"]","has_code":false}
