{"ID":2869724,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.13761","arxiv_id":"2509.13761","title":"THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning","abstract":"Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach to bridge this gap. Despite recent advances, existing methods struggle with three key challenges: constructing tool-integrated reasoning data, performing fine-grained optimization, and enhancing inference. To overcome these limitations, we propose THOR (Tool-Integrated Hierarchical Optimization via RL). First, we introduce TIRGen, a multi-agent based pipeline for constructing high-quality datasets of tool-integrated reasoning paths, aligning with the policy and generalizing well across diverse models. Second, to perform fine-grained hierarchical optimization, we introduce an RL strategy that jointly optimizes for both episode-level problem solving and step-level code generation. This is motivated by our key insight that the success of an intermediate tool call is a strong predictor of the final answer's correctness. Finally, THOR incorporates a self-correction mechanism that leverages immediate tool feedback to dynamically revise erroneous reasoning paths during inference. Our approach demonstrates strong generalization across diverse models, performing effectively in both reasoning and non-reasoning models. It further achieves state-of-the-art performance for models of a similar scale on multiple mathematical benchmarks, while also delivering consistent improvements on code benchmarks. Our code will be publicly available at https://github.com/JingMog/THOR.","short_abstract":"Large Language Models (LLMs) have made remarkable progress in mathematical reasoning, but still continue to struggle with high-precision tasks like numerical computation and formal symbolic manipulation. Integrating external tools has emerged as a promising approach to bridge this gap. Despite recent advances, existing...","url_abs":"https://arxiv.org/abs/2509.13761","url_pdf":"https://arxiv.org/pdf/2509.13761v3","authors":"[\"Qikai Chang\",\"Zhenrong Zhang\",\"Pengfei Hu\",\"Jun Du\",\"Jiefeng Ma\",\"Yicheng Pan\",\"Jianshu Zhang\",\"Quan Liu\",\"Jianqing Gao\"]","published":"2025-09-17T07:16:12Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609712,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2869724,"paper_url":"https://arxiv.org/abs/2509.13761","paper_title":"THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning","repo_url":"https://github.com/JingMog/THOR","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
