{"ID":2881647,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.12001","arxiv_id":"2508.12001","title":"FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis","abstract":"Current non-autoregressive (NAR) text-to-speech (TTS) systems still struggle to model diverse and speaker-dependent duration variation. We further observe that richer duration variation can increase the synthesis difficulty of existing HiFi-GAN-based vocoders, leading to spectral artifacts and unstable time-frequency structures. To address these issues, we propose FNH-TTS, a VITS-based end-to-end TTS system with Mixture-of-Experts duration modeling and robust vocoder-side synthesis. Specifically, we introduce a Mixture-of-Experts Duration Predictor (MoE-DP) to capture diverse phoneme duration patterns and speaker-dependent speaking-rate characteristics. To convert richer duration variation into stable waveform generation, we further integrate a VOCOS-style vocoder with Collaborative Multi-Band and Sub-Band Discriminators. Experiments on LJSpeech, VCTK, and LibriTTS show that FNH-TTS achieves improved synthesis quality, duration-category accuracy, vocoder reconstruction quality, and inference efficiency. Further analysis shows that MoE-DP is the main source of improved duration modeling, while stronger vocoder-side components are necessary for robust synthesis under richer duration variation.","short_abstract":"Current non-autoregressive (NAR) text-to-speech (TTS) systems still struggle to model diverse and speaker-dependent duration variation. We further observe that richer duration variation can increase the synthesis difficulty of existing HiFi-GAN-based vocoders, leading to spectral artifacts and unstable time-frequency s...","url_abs":"https://arxiv.org/abs/2508.12001","url_pdf":"https://arxiv.org/pdf/2508.12001v3","authors":"[\"Qingliang Meng\",\"Yuqing Deng\",\"Wei Liang\",\"Limei Yu\",\"Huizhi Liang\",\"Tian Li\"]","published":"2025-08-16T10:04:21Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
