{"ID":2892343,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.15758","arxiv_id":"2507.15758","title":"LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization","abstract":"Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an external constraint into an intrinsic model capability. Unlike existing approaches that impose rigid limits or rely on post-hoc interventions, LAPO enables models to internalize an understanding of appropriate reasoning depth through a two-stage reinforcement learning process. In the first stage, models learn natural reasoning patterns by discovering the statistical distribution of successful solution lengths. The second stage leverages these patterns as meta-cognitive guidance, embedding them directly within the model's reasoning context to ensure inference-time flexibility. Experiments on mathematical reasoning benchmarks demonstrate that LAPO reduces token usage by up to 40.9% while improving accuracy by 2.3%. Our analysis reveals that models trained with LAPO develop emergent abilities to allocate computational resources based on problem complexity, achieving efficient reasoning without sacrificing quality.","short_abstract":"Large reasoning models have achieved remarkable performance through extended chain-of-thought sequences, yet this computational freedom leads to excessive token generation even for simple problems. We present Length-Adaptive Policy Optimization (LAPO), a novel framework that transforms reasoning length control from an...","url_abs":"https://arxiv.org/abs/2507.15758","url_pdf":"https://arxiv.org/pdf/2507.15758v2","authors":"[\"Xingyu Wu\",\"Yuchen Yan\",\"Shangke Lyu\",\"Linjuan Wu\",\"Yiwen Qiu\",\"Yongliang Shen\",\"Weiming Lu\",\"Jian Shao\",\"Jun Xiao\",\"Yueting Zhuang\"]","published":"2025-07-21T16:14:41Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
