{"ID":2861855,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00499","arxiv_id":"2510.00499","title":"MOSS-Speech: Towards True Speech-to-Speech Models Without Text Guidance","abstract":"Spoken dialogue systems often rely on cascaded pipelines that transcribe, process, and resynthesize speech. While effective, this design discards paralinguistic cues and limits expressivity. Recent end-to-end methods reduce latency and better preserve these cues, yet still rely on text intermediates, creating a fundamental bottleneck. We present MOSS-Speech, a true speech-to-speech large language model that directly understands and generates speech without relying on text guidance. Our approach combines a modality-based layer-splitting architecture with a frozen pre-training strategy, preserving the reasoning and knowledge of pretrained text LLMs while adding native speech capabilities. Experiments show that our model achieves state-of-the-art results in spoken question answering and delivers comparable speech-to-speech performance relative to existing text-guided systems, while still maintaining competitive text performance. By narrowing the gap between text-guided and direct speech generation, our work establishes a new paradigm for expressive and efficient end-to-end speech interaction.","short_abstract":"Spoken dialogue systems often rely on cascaded pipelines that transcribe, process, and resynthesize speech. While effective, this design discards paralinguistic cues and limits expressivity. Recent end-to-end methods reduce latency and better preserve these cues, yet still rely on text intermediates, creating a fundame...","url_abs":"https://arxiv.org/abs/2510.00499","url_pdf":"https://arxiv.org/pdf/2510.00499v2","authors":"[\"Xingjian Zhao\",\"Zhe Xu\",\"Qinyuan Cheng\",\"Zhaoye Fei\",\"Luozhijie Jin\",\"Yang Wang\",\"Hanfu Chen\",\"Yaozhou Jiang\",\"Qinghui Gao\",\"Ke Chen\",\"Ruixiao Li\",\"Mingshu Chen\",\"Ruiming Wang\",\"Wenbo Zhang\",\"Yiyang Zhang\",\"Donghua Yu\",\"Yang Gao\",\"Xiaogui Yang\",\"Yitian Gong\",\"Yuanfan Xu\",\"Yaqian Zhou\",\"Xuanjing Huang\",\"Xipeng Qiu\"]","published":"2025-10-01T04:32:37Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
