{"ID":2848592,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25510","arxiv_id":"2510.25510","title":"MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL","abstract":"As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \\textbf{64.4}\\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.","short_abstract":"As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along wi...","url_abs":"https://arxiv.org/abs/2510.25510","url_pdf":"https://arxiv.org/pdf/2510.25510v1","authors":"[\"Zekun Xu\",\"Siyu Xia\",\"Chuhuai Yue\",\"Jiajun Chai\",\"Mingxue Tian\",\"Xiaohan Wang\",\"Wei Lin\",\"Haoxuan Li\",\"Guojun Yin\"]","published":"2025-10-29T13:34:27Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
