{"ID":2863222,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24193","arxiv_id":"2509.24193","title":"AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play","abstract":"Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the DeepSeek-V3 model using less than 5% of its parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9x more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks. Our code will be published at https://github.com/ritaranx/AceSearcher and https://huggingface.co/AceSearcher.","short_abstract":"Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries...","url_abs":"https://arxiv.org/abs/2509.24193","url_pdf":"https://arxiv.org/pdf/2509.24193v1","authors":"[\"Ran Xu\",\"Yuchen Zhuang\",\"Zihan Dong\",\"Jonathan Wang\",\"Yue Yu\",\"Joyce C. Ho\",\"Linjun Zhang\",\"Haoyu Wang\",\"Wenqi Shi\",\"Carl Yang\"]","published":"2025-09-29T02:14:30Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.IR\",\"cs.LG\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":608985,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2863222,"paper_url":"https://arxiv.org/abs/2509.24193","paper_title":"AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play","repo_url":"https://github.com/ritaranx/AceSearcher","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}