{"ID":2845210,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04108","arxiv_id":"2511.04108","title":"Batch Prompting Suppresses Overthinking Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models","abstract":"Large Reasoning Models (LRMs) achieve strong performance through explicit chain-of-thought reasoning but suffer from \\textit{overthinking}: generating excessive reasoning tokens even for trivial queries. {Beyond inflating cost, overthinking can be self-defeating: models enter recursive self-doubt loops that exhaust token budgets without producing an answer, causing API timeouts that directly hurt accuracy.} We present an empirical study showing that \\textbf{batch prompting}, originally introduced for throughput optimization, effectively suppresses overthinking at inference time. Across 13 diverse benchmarks with DeepSeek-R1 and OpenAI-o1, batch prompting {reduces reasoning tokens by 76\\% (2{,}950$\\mapsto$710), on average, while preserving or improving accuracy}. Through behavioral analysis, we find that batching induces three beneficial effects: (1) it reduces per-query reasoning effort when multiple queries share a context; (2) it enables pattern induction, where models generalize from earlier examples to solve later ones; and (3) it suppresses hedging behavior (e.g., ``\\texttt{wait,}'' ``\\texttt{let me double-check}'') that signals metacognitive loops. We also show that explicit prompt constraints (``\\texttt{Use no more than 100 tokens in thinking.}'') fail to reduce overthinking; models either ignore them or sacrifice accuracy. These findings reframe batch prompting as more than a cost optimization: it is a practical inference-time technique that improves efficiency and reliability without model modification.","short_abstract":"Large Reasoning Models (LRMs) achieve strong performance through explicit chain-of-thought reasoning but suffer from \\textit{overthinking}: generating excessive reasoning tokens even for trivial queries. {Beyond inflating cost, overthinking can be self-defeating: models enter recursive self-doubt loops that exhaust tok...","url_abs":"https://arxiv.org/abs/2511.04108","url_pdf":"https://arxiv.org/pdf/2511.04108v4","authors":"[\"Saurabh Srivastava\",\"Janit Bidhan\",\"Hao Yan\",\"Abhishek Dey\",\"Tanu Kansal\",\"Paras Kath\",\"Sina Mansouri\",\"Mohit Marvania\",\"Vamsi Shankar Simhadri\",\"Gaurav Singh\"]","published":"2025-11-06T06:47:39Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
