{"ID":2850588,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.21339","arxiv_id":"2510.21339","title":"Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning","abstract":"The reasoning capabilities of Large Language Models (LLMs) are typically developed through the single-turn reinforcement learning, whereas real-world applications often involve multi-turn interactions with human feedback, leading to a potential mismatch between training and deployment conditions. In this work, we study whether multi-turn training with human feedback is necessary for reasoning tasks. We compare conventional single-turn training with three multi-turn strategies and reach contrary conclusions to previous research. We find that models trained in a single-turn setting generalize effectively to both single- and multi-turn evaluations, while models trained with multi-turn strategies exhibit a significant degradation in single-turn reasoning performance. These results suggest that for tasks with complete information, robust single-turn training remains more effective and reliable, as multi-turn training with basic feedback provides limited benefits and can even degrade reasoning capabilities.","short_abstract":"The reasoning capabilities of Large Language Models (LLMs) are typically developed through the single-turn reinforcement learning, whereas real-world applications often involve multi-turn interactions with human feedback, leading to a potential mismatch between training and deployment conditions. In this work, we study...","url_abs":"https://arxiv.org/abs/2510.21339","url_pdf":"https://arxiv.org/pdf/2510.21339v2","authors":"[\"Qiang Liu\",\"Wuganjing Song\",\"Zhenzhou Lin\",\"Feifan Chen\",\"Qiaolong Cai\",\"Chen Li\",\"Yongduo Sui\"]","published":"2025-10-24T11:08:32Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.IT\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}
