{"ID":2845477,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.04654","arxiv_id":"2511.04654","title":"Logit-Entropy Adaptive Stopping Heuristic for Efficient Chain-of-Thought Reasoning","abstract":"Chain-of-Thought (CoT) prompting is a key technique for enabling complex reasoning in large language models. However, generating full, fixed-length rationales is computationally wasteful, inflating both token usage and latency. We introduce LEASH: Logit-Entropy Adaptive Stopping Heuristic, a training-free decoding algorithm that adaptively halts rationale generation. LEASH monitors two intrinsic signals: the slope of token-level entropy and the improvement in the top-logit margin. It terminates the generation once both signals plateau, indicating the model has reached a stable reasoning state. Across four instruction-tuned models on the GSM8K and AQuA-RAT benchmarks, LEASH reduces average token generation by 30--35% and latency by 27%, while incurring a 10 p.p. accuracy drop relative to CoT. LEASH is model-agnostic and requires no additional training or supervision, offering a simple and efficient alternative to CoT decoding.","short_abstract":"Chain-of-Thought (CoT) prompting is a key technique for enabling complex reasoning in large language models. However, generating full, fixed-length rationales is computationally wasteful, inflating both token usage and latency. We introduce LEASH: Logit-Entropy Adaptive Stopping Heuristic, a training-free decoding algo...","url_abs":"https://arxiv.org/abs/2511.04654","url_pdf":"https://arxiv.org/pdf/2511.04654v1","authors":"[\"Mohammad Atif Quamar\",\"Mohammad Areeb\"]","published":"2025-11-06T18:43:16Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false}