{"ID":2850893,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22075","arxiv_id":"2510.22075","title":"Agentic Reinforcement Learning for Real-World Code Repair","abstract":"We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across ~1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, we introduced a scalable simplified pipeline for large-scale reinforcement learning (RL). Using this setup, we supervised fine-tuned Qwen3-32B in the full pipeline and applied RL on top of the SFT model in the simplified environment. The SFT model distilled from GPT-4.1 trajectories performs on par while being 56x smaller, and RL added 7-20% absolute gains under matched train-test conditions. \"Thinking mode\" was on par or worse in our experiments. Both SFT and RL models failed to generalize across environments, highlighting the importance of matching train-test environments for building reliable real-world code-fixing agents.","short_abstract":"We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across ~1K real issues by pinning dependencies...","url_abs":"https://arxiv.org/abs/2510.22075","url_pdf":"https://arxiv.org/pdf/2510.22075v1","authors":"[\"Siyu Zhu\",\"Anastasiya Karpovich\",\"Albert Chen\",\"Jessica Koscheka\",\"Shailesh Jannu\",\"Di Wen\",\"Yuqing Zhu\",\"Rohit Jain\",\"Alborz Geramifard\"]","published":"2025-10-24T23:25:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
