{"ID":2891474,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17735","arxiv_id":"2507.17735","title":"Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data","abstract":"Accent normalization converts foreign-accented speech into native-like speech while preserving speaker identity. We propose a novel pipeline using self-supervised discrete tokens and non-parallel training data. The system extracts tokens from source speech, converts them through a dedicated model, and synthesizes the output using flow matching. Our method demonstrates superior performance over a frame-to-frame baseline in naturalness, accentedness reduction, and timbre preservation across multiple English accents. Through token-level phonetic analysis, we validate the effectiveness of our token-based approach. We also develop two duration preservation methods, suitable for applications such as dubbing.","short_abstract":"Accent normalization converts foreign-accented speech into native-like speech while preserving speaker identity. We propose a novel pipeline using self-supervised discrete tokens and non-parallel training data. The system extracts tokens from source speech, converts them through a dedicated model, and synthesizes the o...","url_abs":"https://arxiv.org/abs/2507.17735","url_pdf":"https://arxiv.org/pdf/2507.17735v1","authors":"[\"Qibing Bai\",\"Sho Inoue\",\"Shuai Wang\",\"Zhongjie Jiang\",\"Yannan Wang\",\"Haizhou Li\"]","published":"2025-07-23T17:51:03Z","proceeding":"eess.AS","tasks":"[\"eess.AS\",\"cs.SD\"]","methods":"[]","has_code":false}