{"ID":2875554,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.02521","arxiv_id":"2509.02521","title":"FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training","abstract":"Full-duplex dialog models aim to listen and speak simultaneously, delivering rapid responses to dynamic user input. Among different solutions to full-duplexity, a native solution merges multiple channels in each time step, achieving the lowest latency. However, prevailing designs break down the textual monologue sentences for word-level alignment with audio streams, which degrades language modeling abilities. To help address this issue, we introduce \"contiguous monologues\", which are composed by continuous sentences and \"waiting\" intervals, mimicking human-like cognitive behavior in dialogs. We find a proper training paradigm to be critical for semantically aligning contiguous monologues with audio. To this end, we develop a \"dual\" training paradigm that alternates the position of the monologues, either leading or trailing the audio, across different training stages. A combination of our contiguous monologue and dual training strategy is applied in developing FLM-Audio, our 7B spoken dialog chatbot with native full-duplexity. As confirmed by experimental results, FLM-Audio achieves superior response qualities and chatting experiences while requiring significantly less training data.","short_abstract":"Full-duplex dialog models aim to listen and speak simultaneously, delivering rapid responses to dynamic user input. Among different solutions to full-duplexity, a native solution merges multiple channels in each time step, achieving the lowest latency. However, prevailing designs break down the textual monologue senten...","url_abs":"https://arxiv.org/abs/2509.02521","url_pdf":"https://arxiv.org/pdf/2509.02521v3","authors":"[\"Yiqun Yao\",\"Xiang Li\",\"Xin Jiang\",\"Xuezhi Fang\",\"Naitong Yu\",\"Wenjia Ma\",\"Aixin Sun\",\"Yequan Wang\"]","published":"2025-09-02T17:18:49Z","proceeding":"cs.SD","tasks":"[\"cs.SD\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}
