{"ID":2865281,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.22243","arxiv_id":"2509.22243","title":"FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction","abstract":"Full-Duplex Speech-to-Speech Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling real-time spoken dialogue systems. However, benchmarking and modeling these models remains a fundamental challenge. We introduce FLEXI, the first benchmark for full-duplex LLM-human spoken interaction that explicitly incorporates model interruption in emergency scenarios. FLEXI systematically evaluates the latency, quality, and conversational effectiveness of real-time dialogue through six diverse human-LLM interaction scenarios, revealing significant gaps between open source and commercial models in emergency awareness, turn terminating, and interaction latency. Finally, we suggest that next token-pair prediction offers a promising path toward achieving truly seamless and human-like full-duplex interaction.","short_abstract":"Full-Duplex Speech-to-Speech Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling real-time spoken dialogue systems. However, benchmarking and modeling these models remains a fundamental challenge. We introduce FLEXI, the first benchmark for full-duplex LLM-human spoken interact...","url_abs":"https://arxiv.org/abs/2509.22243","url_pdf":"https://arxiv.org/pdf/2509.22243v1","authors":"[\"Yuan Ge\",\"Saihan Chen\",\"Jingqi Xiao\",\"Xiaoqian Liu\",\"Tong Xiao\",\"Yan Xiang\",\"Zhengtao Yu\",\"Jingbo Zhu\"]","published":"2025-09-26T11:57:42Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
