{"ID":2866346,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19736","arxiv_id":"2509.19736","title":"UserRL: Training Interactive User-Centric Agent via Reinforcement Learning","abstract":"Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this work, we propose UserRL, a unified framework for training and evaluating user-centric abilities through standardized gym environments paired with simulated users. We systematically vary turn-level reward assignment and trajectory-level score calculation to analyze how different formulations affect learning under the GRPO algorithm. Our experiments across Qwen3 models reveal three key findings: (i) SFT cold start is critical for unlocking initial interaction ability and enabling sustained RL improvements; (ii) deliberate trajectory scoring yields more efficient and effective multi-turn interactions; and (iii) while stronger simulated users (e.g., GPT-4o) facilitates training, open-source simulators (e.g., Qwen3-32B) remain a cost-effective and transferable option. Together, these results highlight that careful design of reward shaping and user simulation choice is as crucial as model scale, and establish UserRL as a practical pathway for developing robust user-centric agentic models. All codes and data are public for future research.","short_abstract":"Reinforcement learning (RL) has shown promise in training agentic models that move beyond static benchmarks to engage in dynamic, multi-turn interactions. Yet, the ultimate value of such agents lies in their ability to assist users, a setting where diversity and dynamics of user interaction pose challenges. In this wor...","url_abs":"https://arxiv.org/abs/2509.19736","url_pdf":"https://arxiv.org/pdf/2509.19736v1","authors":"[\"Cheng Qian\",\"Zuxin Liu\",\"Akshara Prabhakar\",\"Jielin Qiu\",\"Zhiwei Liu\",\"Haolin Chen\",\"Shirley Kokane\",\"Heng Ji\",\"Weiran Yao\",\"Shelby Heinecke\",\"Silvio Savarese\",\"Caiming Xiong\",\"Huan Wang\"]","published":"2025-09-24T03:33:20Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
