{"ID":2866861,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18531","arxiv_id":"2509.18531","title":"No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS","abstract":"Recent work reports gains in neural text-to-speech (TTS) with Group Relative Policy Optimization (GRPO). However, in the absence of a verifiable reward for \\textit{prosody}, GRPO trained on transcription-oriented signals (CER/NLL) lowers error rates yet collapses prosody into monotone, unnatural speech; adding speaker-similarity further destabilizes training and degrades CER. We address this with an \\textit{iterative Direct Preference Optimization (DPO)} scheme that uses only a few hundred human-labeled preference pairs per round to directly optimize prosodic naturalness while regularizing to the current model. On \\textbf{KoCC-TTS}, a curated dataset of authentic Korean call center interactions capturing task-oriented dialogues, our method attains the highest human preference (ELO) with competitive CER, outperforming GRPO and strong commercial baselines. These results suggest that when prosody cannot be rewarded automatically, \\textit{human preference optimization} offers a practical and data-efficient path to natural and robust TTS. The demo page is available at \\href{https://tts.ch.dev}","short_abstract":"Recent work reports gains in neural text-to-speech (TTS) with Group Relative Policy Optimization (GRPO). However, in the absence of a verifiable reward for \\textit{prosody}, GRPO trained on transcription-oriented signals (CER/NLL) lowers error rates yet collapses prosody into monotone, unnatural speech; adding speaker-...","url_abs":"https://arxiv.org/abs/2509.18531","url_pdf":"https://arxiv.org/pdf/2509.18531v2","authors":"[\"Seungyoun Shin\",\"Dongha Ahn\",\"Jiwoo Kim\",\"Sungwook Jeon\"]","published":"2025-09-23T01:51:38Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.AI\",\"cs.CL\",\"cs.SD\"]","methods":"[]","project_urls":"[\"https://tts.ch.dev\"]","has_code":false}
