{"ID":2897873,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04349","arxiv_id":"2507.04349","title":"TTS-CtrlNet: Time varying emotion aligned text-to-speech generation with ControlNet","abstract":"Recent advances in text-to-speech (TTS) have enabled natural speech synthesis, but fine-grained, time-varying emotion control remains challenging. Existing methods often allow only utterance-level control and require full model fine-tuning with a large emotion speech dataset, which can degrade performance. Inspired by adding conditional control to the existing model in ControlNet (Zhang et al, 2023), we propose the first ControlNet-based approach for controllable flow-matching TTS (TTS-CtrlNet), which freezes the original model and introduces a trainable copy of it to process additional conditions. We show that TTS-CtrlNet can boost the pretrained large TTS model by adding intuitive, scalable, and time-varying emotion control while inheriting the ability of the original model (e.g., zero-shot voice cloning \u0026 naturalness). Furthermore, we provide practical recipes for adding emotion control: 1) optimal architecture design choice with block analysis, 2) emotion-specific flow step, and 3) flexible control scale. Experiments show that ours can effectively add an emotion controller to existing TTS, and achieves state-of-the-art performance with emotion similarity scores: Emo-SIM and Aro-Val SIM. The project page is available at: https://curryjung.github.io/ttsctrlnet_project_page","short_abstract":"Recent advances in text-to-speech (TTS) have enabled natural speech synthesis, but fine-grained, time-varying emotion control remains challenging. Existing methods often allow only utterance-level control and require full model fine-tuning with a large emotion speech dataset, which can degrade performance. Inspired by...","url_abs":"https://arxiv.org/abs/2507.04349","url_pdf":"https://arxiv.org/pdf/2507.04349v1","authors":"[\"Jaeseok Jeong\",\"Yuna Lee\",\"Mingi Kwon\",\"Youngjung Uh\"]","published":"2025-07-06T11:22:08Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"eess.AS\"]","methods":"[]","has_code":false}
