{"ID":2860342,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.06261","arxiv_id":"2510.06261","title":"AlphaApollo: A System for Deep Agentic Reasoning","abstract":"We present AlphaApollo, an agentic reasoning system that targets two bottlenecks in foundation-model reasoning: (1) limited reasoning capacity for complex, long-horizon problem solving and (2) unreliable test-time evolution without trustworthy verification. AlphaApollo orchestrates models and tools via three components: (i) multi-turn agentic reasoning, which formalizes model-environment interaction with structured tool calls and responses; (ii) multi-turn agentic learning, which applies turn-level reinforcement learning to optimize tool-use reasoning while decoupling actions from tool responses for stable training; and (iii) multi-round agentic evolution, which refines solutions through a propose-judge-update loop with tool-assisted verifications and long-horizon memory. Across seven math reasoning benchmarks and multiple model scales, AlphaApollo improves performance through reliable tool use (\u003e 85% tool-call success), substantial gains from multi-turn RL (Avg@32: Qwen2.5-1.5B-Instruct 1.07% -\u003e 9.64%, Qwen2.5-7B-Instruct 8.77% -\u003e 20.35%), and improvements from evolution (e.g., Qwen2.5-3B-Instruct 5.27% -\u003e 7.70%, Qwen2.5-14B-Instruct 16.53% -\u003e 21.08%). This project is still ongoing. We welcome feedback from the community and will frequently update the source code and technical report.","short_abstract":"We present AlphaApollo, an agentic reasoning system that targets two bottlenecks in foundation-model reasoning: (1) limited reasoning capacity for complex, long-horizon problem solving and (2) unreliable test-time evolution without trustworthy verification. AlphaApollo orchestrates models and tools via three components...","url_abs":"https://arxiv.org/abs/2510.06261","url_pdf":"https://arxiv.org/pdf/2510.06261v2","authors":"[\"Zhanke Zhou\",\"Chentao Cao\",\"Xiao Feng\",\"Xuan Li\",\"Zongze Li\",\"Xiangyu Lu\",\"Jiangchao Yao\",\"Weikai Huang\",\"Tian Cheng\",\"Jianghangfan Zhang\",\"Tangyu Jiang\",\"Linrui Xu\",\"Yiming Zheng\",\"Brando Miranda\",\"Tongliang Liu\",\"Sanmi Koyejo\",\"Masashi Sugiyama\",\"Bo Han\"]","published":"2025-10-05T15:42:24Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
