{"ID":2891388,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.17548","arxiv_id":"2507.17548","title":"CodeReasoner: Enhancing the Code Reasoning Ability with Reinforcement Learning","abstract":"Code reasoning is a fundamental capability for large language models (LLMs) in the code domain. It involves understanding and predicting a program's execution behavior, such as determining the output for a given input or whether a specific statement will be executed. This capability is essential for downstream tasks like debugging, code generation, and program repair. Prior approaches mainly rely on supervised fine-tuning to improve performance in code reasoning tasks. However, they often show limited gains and fail to generalize across diverse scenarios. We argue this is due to two core issues: the low quality of training data and the limitations of supervised fine-tuning, which struggles to teach general reasoning skills. To address these challenges, we propose CodeReasoner, a framework that spans both dataset construction and a two-stage training process. First, we introduce a method to construct datasets that focus on the core execution logic of Python programs. Next, we apply instruction tuning to inject execution-specific knowledge distilled from a powerful teacher model. We then enhance reasoning and generalization through GRPO reinforcement learning on top of the fine-tuned model. Experiments on three widely-used code reasoning benchmarks show that CodeReasoner improves performance by 27.1% to 40.2% over prior methods using a 7B model. Notably, the 7B model matches GPT-4o on key tasks like input/output and coverage prediction. When scaled to 14B, CodeReasoner outperforms GPT-4o across all benchmarks. Ablation studies confirm the effectiveness of each training stage and highlight the importance of reasoning chains.","short_abstract":"Code reasoning is a fundamental capability for large language models (LLMs) in the code domain. It involves understanding and predicting a program's execution behavior, such as determining the output for a given input or whether a specific statement will be executed. This capability is essential for downstream tasks li...","url_abs":"https://arxiv.org/abs/2507.17548","url_pdf":"https://arxiv.org/pdf/2507.17548v1","authors":"[\"Lingxiao Tang\",\"He Ye\",\"Zhongxin Liu\",\"Xiaoxue Ren\",\"Lingfeng Bao\"]","published":"2025-07-23T14:26:58Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}