{"ID":2828461,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.14291","arxiv_id":"2512.14291","title":"GLM-TTS Technical Report","abstract":"This work proposes GLM-TTS, a production-level TTS system designed for efficiency, controllability, and high-fidelity speech generation. GLM-TTS follows a two-stage architecture, consisting of a text-to-token autoregressive model and a token-to-waveform diffusion model. With only 100k hours of training data, GLM-TTS achieves state-of-the-art performance on multiple open-source benchmarks. To meet production requirements, GLM-TTS improves speech quality through an optimized speech tokenizer with fundamental frequency constraints and a GRPO-based multi-reward reinforcement learning framework that jointly optimizes pronunciation, speaker similarity, and expressive prosody. In parallel, the system enables efficient and controllable deployment via parameter-efficient LoRA-based voice customization and a hybrid phoneme-text input scheme that provides precise pronunciation control. Our code is available at https://github.com/zai-org/GLM-TTS. Real-time speech synthesis demos are provided via Z.ai (audio.z.ai), the Zhipu Qingyan app/web (chatglm.cn).","short_abstract":"This work proposes GLM-TTS, a production-level TTS system designed for efficiency, controllability, and high-fidelity speech generation. GLM-TTS follows a two-stage architecture, consisting of a text-to-token autoregressive model and a token-to-waveform diffusion model. With only 100k hours of training data, GLM-TTS ac...","url_abs":"https://arxiv.org/abs/2512.14291","url_pdf":"https://arxiv.org/pdf/2512.14291v1","authors":"[\"Jiayan Cui\",\"Zhihan Yang\",\"Naihan Li\",\"Jiankun Tian\",\"Xingyu Ma\",\"Yi Zhang\",\"Guangyu Chen\",\"Runxuan Yang\",\"Yuqing Cheng\",\"Yizhi Zhou\",\"Guochen Yu\",\"Xiaotao Gu\",\"Jie Tang\"]","published":"2025-12-16T11:04:41Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Reinforcement Learning\",\"Diffusion Model\",\"LoRA\"]","has_code":false,"code_links":[{"ID":605876,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2828461,"paper_url":"https://arxiv.org/abs/2512.14291","paper_title":"GLM-TTS Technical Report","repo_url":"https://github.com/zai-org/GLM-TTS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
