{"ID":2834534,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.01725","arxiv_id":"2512.01725","title":"Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks","abstract":"Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks that require generating comprehensive and diverse answers. We attribute this limitation to \\textbf{reasoning overconfidence}: a tendency to express undue certainty in an incomplete solution set. To examine the effect, we introduce \\textit{MuSoBench}, a benchmark of multi-solution problems. Experiments show that the conventional short chain-of-thought (Short-CoT) prompting paradigm exhibits pronounced overconfidence, whereas the emerging long chain-of-thought (Long-CoT) approach mitigates it through iterative exploration and self-reflection. We further characterise observable behaviours and influential factors. To probe the underlying cause, we propose the \\textbf{cognitive-rigidity hypothesis}, which posits that overconfidence arises when the reasoning process prematurely converges on a narrow set of thought paths. An attention-entropy analysis offers preliminary support for this view. These findings provide tools for assessing the completeness of LLM reasoning and highlight the need to move evaluation beyond single-answer accuracy toward comprehensive exploration.","short_abstract":"Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks that require generating comprehensive and diverse answers. We attribute this limitation to \\textbf{reasoning overconfidence}: a tendency to express undue certainty in an incomplete so...","url_abs":"https://arxiv.org/abs/2512.01725","url_pdf":"https://arxiv.org/pdf/2512.01725v1","authors":"[\"Jiannan Guan\",\"Qiguang Chen\",\"Libo Qin\",\"Dengyun Peng\",\"Jinhao Liu\",\"Liangyu Huo\",\"Jian Xie\",\"Wanxiang Che\"]","published":"2025-12-01T14:35:06Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
