{"ID":2861891,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00546","arxiv_id":"2510.00546","title":"ThinkBrake: Efficient Reasoning via Log-Probability Margin Guided Decoding","abstract":"Large Reasoning Models (LRMs) allocate substantial inference-time compute to Chain-of-Thought (CoT) reasoning, improving performance on mathematics, scientific QA, and tool usage. However, this introduces overthinking: LRMs often reach a correct intermediate solution, continue reasoning, and overwrite it with an incorrect answer. We first demonstrate that oracle stopping--where we inject \u003c/think\u003e at every sentence boundary and select the best stopping point in hindsight--improves average accuracy by 8% while reducing thinking tokens by 72%, exposing substantial overthinking. Motivated by this finding, we propose ThinkBrake, which monitors the log-probability margin between the top continuation token and \u003c/think\u003e at sentence boundaries, stopping reasoning when this margin narrows. ThinkBrake requires no training and achieves favorable accuracy-efficiency trade-offs across math, scientific QA, and tool usage benchmarks, reducing thinking token usage by up to 30%. Furthermore, we provide theoretical analysis showing that ThinkBrake is equivalent to test-time realignment with a reward bonus for the \u003c/think\u003e token.","short_abstract":"Large Reasoning Models (LRMs) allocate substantial inference-time compute to Chain-of-Thought (CoT) reasoning, improving performance on mathematics, scientific QA, and tool usage. However, this introduces overthinking: LRMs often reach a correct intermediate solution, continue reasoning, and overwrite it with an incorr...","url_abs":"https://arxiv.org/abs/2510.00546","url_pdf":"https://arxiv.org/pdf/2510.00546v5","authors":"[\"Sangjun Song\",\"Minjae Oh\",\"Seungkyu Lee\",\"Sungmin Jo\",\"Yohan Jo\"]","published":"2025-10-01T06:04:57Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[]","has_code":false}
