{"ID":2825476,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21120","arxiv_id":"2512.21120","title":"ClarifyMT-Bench: Benchmarking and Improving Multi-Turn Clarification for Conversational Large Language Models","abstract":"Large language models (LLMs) are increasingly deployed as conversational assistants in open-domain, multi-turn settings, where users often provide incomplete or ambiguous information. However, existing LLM-focused clarification benchmarks primarily assume single-turn interactions or cooperative users, limiting their ability to evaluate clarification behavior in realistic settings. We introduce \\textbf{ClarifyMT-Bench}, a benchmark for multi-turn clarification grounded in a five-dimensional ambiguity taxonomy and a set of six behaviorally diverse simulated user personas. Through a hybrid LLM-human pipeline, we construct 6,120 multi-turn dialogues capturing diverse ambiguity sources and interaction patterns. Evaluating ten representative LLMs uncovers a consistent under-clarification bias: LLMs tend to answer prematurely, and performance degrades as dialogue depth increases. To mitigate this, we propose \\textbf{ClarifyAgent}, an agentic approach that decomposes clarification into perception, forecasting, tracking, and planning, substantially improving robustness across ambiguity conditions. ClarifyMT-Bench establishes a reproducible foundation for studying when LLMs should ask, when they should answer, and how to navigate ambiguity in real-world human-LLM interactions.","short_abstract":"Large language models (LLMs) are increasingly deployed as conversational assistants in open-domain, multi-turn settings, where users often provide incomplete or ambiguous information. However, existing LLM-focused clarification benchmarks primarily assume single-turn interactions or cooperative users, limiting their ab...","url_abs":"https://arxiv.org/abs/2512.21120","url_pdf":"https://arxiv.org/pdf/2512.21120v1","authors":"[\"Sichun Luo\",\"Yi Huang\",\"Mukai Li\",\"Shichang Meng\",\"Fengyuan Liu\",\"Zefa Hu\",\"Junlan Feng\",\"Qi Liu\"]","published":"2025-12-24T11:39:00Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.IR\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}