{"ID":2847954,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.26287","arxiv_id":"2510.26287","title":"Empowering RepoQA-Agent based on Reinforcement Learning Driven by Monte-carlo Tree Search","abstract":"Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively in tool utilization and decision-making based on environmental feedback, while training-based approaches typically rely on costly distillation from larger LLMs, introducing data compliance concerns in enterprise environments. To address these challenges, we introduce RepoSearch-R1, a novel agentic reinforcement learning framework driven by Monte-carlo Tree Search (MCTS). This approach allows agents to generate diverse, high-quality reasoning trajectories via self-training without requiring model distillation or external supervision. Based on RepoSearch-R1, we construct a RepoQA-Agent specifically designed for repository question-answering tasks. Comprehensive evaluation on repository question-answering tasks demonstrates that RepoSearch-R1 achieves substantial improvements of answer completeness: 16.0% enhancement over no-retrieval methods, 19.5% improvement over iterative retrieval methods, and 33% increase in training efficiency compared to general agentic reinforcement learning approaches. Our cold-start training methodology eliminates data compliance concerns while maintaining robust exploration diversity and answer completeness across repository-level reasoning tasks.","short_abstract":"Repository-level software engineering tasks require large language models (LLMs) to efficiently navigate and extract information from complex codebases through multi-turn tool interactions. Existing approaches face significant limitations: training-free, in-context learning methods struggle to guide agents effectively...","url_abs":"https://arxiv.org/abs/2510.26287","url_pdf":"https://arxiv.org/pdf/2510.26287v1","authors":"[\"Guochang Li\",\"Yuchen Liu\",\"Zhen Qin\",\"Yunkun Wang\",\"Jianping Zhong\",\"Chen Zhi\",\"Binhua Li\",\"Fei Huang\",\"Yongbin Li\",\"Shuiguang Deng\"]","published":"2025-10-30T09:10:36Z","proceeding":"cs.SE","tasks":"[\"cs.SE\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}
