{"ID":2886781,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.02178","arxiv_id":"2508.02178","title":"Reconsidering Overthinking: Penalizing Internal and External Redundancy in CoT Reasoning","abstract":"Large Reasoning Models (LRMs) often suffer from overthinking, generating verbose reasoning traces that compromise both computational efficiency and interpretability. Unlike prior efforts that rely on global length-based rewards, we propose a semantic-aware decomposition of redundancy into two distinct forms: internal redundancy (informational stagnation within the reasoning process) and external redundancy (superfluous continuation after the final answer). We introduce a dual-penalty reinforcement learning framework that surgically targets these inefficiencies: a sliding-window semantic analysis is employed to penalize low-gain steps within the reasoning trajectory, while a normalized metric suppresses the post-answer tail. Extensive experiments demonstrate that our method significantly compresses Chain-of-Thought traces with minimal accuracy degradation, while maintaining strong generalization to out-of-domain tasks. Crucially, we reveal an asymmetry in redundancy: external redundancy can be safely eliminated without performance loss, whereas internal redundancy removal requires a calibrated trade-off to maintain reasoning fidelity. Our framework enables fine-grained, implicit control over reasoning length, paving the way for more concise and interpretable LRMs.","short_abstract":"Large Reasoning Models (LRMs) often suffer from overthinking, generating verbose reasoning traces that compromise both computational efficiency and interpretability. Unlike prior efforts that rely on global length-based rewards, we propose a semantic-aware decomposition of redundancy into two distinct forms: internal r...","url_abs":"https://arxiv.org/abs/2508.02178","url_pdf":"https://arxiv.org/pdf/2508.02178v2","authors":"[\"Jialiang Hong\",\"Taihang Zhen\",\"Kai Chen\",\"Jiaheng Liu\",\"Junlan Feng\",\"Wenpeng Zhu\",\"Jing Huo\",\"Yang Gao\",\"Depeng Wang\",\"Haitao Wan\",\"Xi Yang\",\"Boyan Wang\",\"Fanyu Meng\",\"Yuyao Zhang\"]","published":"2025-08-04T08:22:14Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
