{"ID":2866399,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19817","arxiv_id":"2509.19817","title":"MMedFD: A Real-world Healthcare Benchmark for Multi-turn Full-Duplex Automatic Speech Recognition","abstract":"Automatic speech recognition (ASR) in clinical dialogue demands robustness to full-duplex interaction, speaker overlap, and low-latency constraints, yet open benchmarks remain scarce. We present MMedFD, the first real-world Chinese healthcare ASR corpus designed for multi-turn, full-duplex settings. Captured from a deployed AI assistant, the dataset comprises 5,805 annotated sessions with synchronized user and mixed-channel views, RTTM/CTM timing, and role labels. We introduce a model-agnostic pipeline for streaming segmentation, speaker attribution, and dialogue memory, and fine-tune Whisper-small on role-concatenated audio for long-context recognition. ASR evaluation includes WER, CER, and HC-WER, which measures concept-level accuracy across healthcare settings. LLM-generated responses are assessed using rubric-based and pairwise protocols. MMedFD establishes a reproducible framework for benchmarking streaming ASR and end-to-end duplex agents in healthcare deployment. The dataset and related resources are publicly available at https://github.com/Kinetics-JOJO/MMedFD","short_abstract":"Automatic speech recognition (ASR) in clinical dialogue demands robustness to full-duplex interaction, speaker overlap, and low-latency constraints, yet open benchmarks remain scarce. We present MMedFD, the first real-world Chinese healthcare ASR corpus designed for multi-turn, full-duplex settings. Captured from a dep...","url_abs":"https://arxiv.org/abs/2509.19817","url_pdf":"https://arxiv.org/pdf/2509.19817v2","authors":"[\"Hongzhao Chen\",\"XiaoYang Wang\",\"Jing Lan\",\"Hexiao Ding\",\"Yufeng Jiang\",\"MingHui Yang\",\"DanHui Xu\",\"Jun Luo\",\"Nga-Chun Ng\",\"Gerald W. Y. Cheng\",\"Yunlin Mao\",\"Jung Sun Yoo\"]","published":"2025-09-24T06:56:26Z","proceeding":"eess.AS","tasks":"[\"eess.AS\"]","methods":"[\"Large Language Model\"]","has_code":false,"code_links":[{"ID":609369,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2866399,"paper_url":"https://arxiv.org/abs/2509.19817","paper_title":"MMedFD: A Real-world Healthcare Benchmark for Multi-turn Full-Duplex Automatic Speech Recognition","repo_url":"https://github.com/Kinetics-JOJO/MMedFD","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
