{"ID":2859263,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05758","arxiv_id":"2510.05758","title":"EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS","abstract":"Recent LLM-based TTS systems achieve strong quality and zero-shot ability, but lack fine-grained emotional control due to their reliance on discrete speech tokens. Existing approaches either limit emotions to categorical labels or cannot generalize to LLM-based architectures. We propose EMORL-TTS (Fine-grained Emotion-controllable TTS with Reinforcement Learning), a framework that unifies global intensity control in the VAD space with local emphasis regulation. Our method combines supervised fine-tuning with reinforcement learning guided by task-specific rewards for emotion category, intensity, and emphasis. Moreover, we further investigate how emphasis placement modulates fine-grained emotion intensity. Experiments show that EMORL-TTS improves emotion accuracy, intensity differentiation, and emphasis clarity, while preserving synthesis quality comparable to strong LLM-based baselines.","short_abstract":"Recent LLM-based TTS systems achieve strong quality and zero-shot ability, but lack fine-grained emotional control due to their reliance on discrete speech tokens. Existing approaches either limit emotions to categorical labels or cannot generalize to LLM-based architectures. We propose EMORL-TTS (Fine-grained Emotion-...","url_abs":"https://arxiv.org/abs/2510.05758","url_pdf":"https://arxiv.org/pdf/2510.05758v2","authors":"[\"Haoxun Li\",\"Yu Liu\",\"Yuqing Sun\",\"Hanlei Shi\",\"Leyuan Qu\",\"Taihao Li\"]","published":"2025-10-07T10:24:12Z","proceeding":"cs.SD","tasks":"[\"cs.SD\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false}
