{"ID":2822604,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.02179","arxiv_id":"2601.02179","title":"Confidence Estimation for LLMs in Multi-turn Interactions","abstract":"While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research overwhelmingly focuses on single-turn settings. The dynamics of model confidence in multi-turn conversations, where context accumulates and ambiguity is progressively resolved, remain largely unexplored. This work presents the first systematic study of confidence estimation in multi-turn interactions, establishing a formal evaluation framework grounded in two key desiderata: per-turn calibration and monotonicity of confidence as more information becomes available. To facilitate this, we introduce novel metrics, including a length-normalized Expected Calibration Error (InfoECE), and a new \"Hinter-Guesser\" paradigm for generating controlled evaluation datasets. Our experiments reveal that widely-used confidence techniques struggle with calibration and monotonicity in multi-turn dialogues. In contrast, a novel logit-based probe we introduce, P(Sufficient), proves comparatively more effective, robustly tracking evidence accumulation and distinguishing it from conversational filler. Our work provides a foundational methodology for developing more reliable and trustworthy conversational agents.","short_abstract":"While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research overwhelmingly focuses on single-turn settings. The dynamics of model confidence in multi-turn conversations, where context accumulates and ambiguity is progressively resolved, remain lar...","url_abs":"https://arxiv.org/abs/2601.02179","url_pdf":"https://arxiv.org/pdf/2601.02179v2","authors":"[\"Caiqi Zhang\",\"Ruihan Yang\",\"Xiaochen Zhu\",\"Chengzu Li\",\"Tiancheng Hu\",\"Yijiang River Dong\",\"Deqing Yang\",\"Nigel Collier\"]","published":"2026-01-05T14:58:04Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}