{"ID":2887193,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01550","arxiv_id":"2508.01550","title":"RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale","abstract":"Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\\% on SWE-Bench-Verified~\\citep{swebench_verified2024}, establishing new state-of-the-art for $\\leq$8B non-thinking LLMs; (2) 7,304 executable environments auto-generated from real GitHub commits with zero manual intervention; (3) 14$\\times$ storage reduction (1.4GB $\\rightarrow$ 102MB per instance) via intelligent dependency management and image pruning; (4) $\u003e$70\\% faster evaluation using a Ray-powered~\\citep{ray2018} distributed RepoForge harness; (5) 19,000$\\times$ cheaper labeling through our automated SPICE~\\citep{spice2024} difficulty assessment technique. By unifying storage-efficient sandboxing, Ray-powered evaluation harness, automated data generation, SPICE-based labeling, and bubble-free RL scaffold, we demonstrate that even $\\leq$8B models can reach new state-of-the-art performance on demanding benchmarks like SWE-Bench-Verified. Our approach addresses critical bottlenecks in SWE agent training: high storage costs of container-based evaluation, inefficient sequential reward pipelines, limited availability of high-quality training data, expensive manual labeling, and multi-turn RL pipeline bottlenecks.","short_abstract":"Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1)...","url_abs":"https://arxiv.org/abs/2508.01550","url_pdf":"https://arxiv.org/pdf/2508.01550v2","authors":"[\"Zhilong Chen\",\"Chengzong Zhao\",\"Boyuan Chen\",\"Dayi Lin\",\"Yihao Chen\",\"Arthur Leung\",\"Gopi Krishnan Rajbahadur\",\"Gustavo A. Oliva\",\"Haoxiang Zhang\",\"Aaditya Bhatia\",\"Chong Chun Yong\",\"Ahmed E. Hassan\"]","published":"2025-08-03T02:34:16Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Large Language Model\"]","has_code":false}
