{"ID":2847078,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.01937","arxiv_id":"2511.01937","title":"Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR","abstract":"Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require longer reasoning chains. This skews the output length distribution upward, resulting in a \\textbf{model that conflates ``thinking longer'' with ``thinking better''}. In this work, we show that retaining and modestly up-weighting moderately easy problems acts as an implicit length regularizer. Exposing the model to solvable short-chain tasks constrains its output distribution and prevents runaway verbosity. The result is \\textbf{\\emph{emergent brevity for free}}: the model learns to solve harder problems without inflating the output length, \\textbf{ despite the absence of any explicit length penalization}. RLVR experiments using this approach on \\textit{Qwen3-4B-Thinking-2507} (with a 16k token limit) achieve baseline pass@1 AIME25 accuracy while generating solutions that are, on average, nearly twice as short. The code is available at \\href{https://github.com/MBZUAI-Paris/Frugal-AI}{GitHub}, with datasets and models on \\href{https://huggingface.co/collections/MBZUAI-Paris/k2-think-mini-68dcfa8b114686a4bd3dc2bc}{Hugging Face}.","short_abstract":"Large language models (LLMs) trained for step-by-step reasoning often become excessively verbose, raising inference cost. Standard Reinforcement Learning with Verifiable Rewards (RLVR) pipelines filter out ``easy'' problems for training efficiency, leaving the model to train primarily on harder problems that require lo...","url_abs":"https://arxiv.org/abs/2511.01937","url_pdf":"https://arxiv.org/pdf/2511.01937v2","authors":"[\"Abdelaziz Bounhar\",\"Hadi Abdine\",\"Evan Dufraisse\",\"Ahmad Chamma\",\"Amr Mohamed\",\"Dani Bouch\",\"Michalis Vazirgiannis\",\"Guokan Shang\"]","published":"2025-11-02T17:29:16Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":607484,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2847078,"paper_url":"https://arxiv.org/abs/2511.01937","paper_title":"Shorter but not Worse: Frugal Reasoning via Easy Samples as Length Regularizers in Math RLVR","repo_url":"https://github.com/MBZUAI-Paris/Frugal-AI","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
