{"ID":2869078,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14515","arxiv_id":"2509.14515","title":"From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models","abstract":"True Full-Duplex (TFD) voice communication--enabling simultaneous listening and speaking with natural turn-taking, overlapping speech, and interruptions--represents a critical milestone toward human-like AI interaction. This survey comprehensively reviews Full-Duplex Spoken Language Models (FD-SLMs) in the LLM era. We establish a taxonomy distinguishing Engineered Synchronization (modular architectures) from Learned Synchronization (end-to-end architectures), and unify fragmented evaluation approaches into a framework encompassing Temporal Dynamics, Behavioral Arbitration, Semantic Coherence, and Acoustic Performance. Through comparative analysis of mainstream FD-SLMs, we identify fundamental challenges: synchronous data scarcity, architectural divergence, and evaluation gaps, providing a roadmap for advancing human-AI communication.","short_abstract":"True Full-Duplex (TFD) voice communication--enabling simultaneous listening and speaking with natural turn-taking, overlapping speech, and interruptions--represents a critical milestone toward human-like AI interaction. This survey comprehensively reviews Full-Duplex Spoken Language Models (FD-SLMs) in the LLM era. We...","url_abs":"https://arxiv.org/abs/2509.14515","url_pdf":"https://arxiv.org/pdf/2509.14515v1","authors":"[\"Yuxuan Chen\",\"Haoyuan Yu\"]","published":"2025-09-18T01:00:58Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.SD\",\"eess.AS\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
