{"ID":2864226,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.23738","arxiv_id":"2509.23738","title":"GUI-Shepherd: Reliable Process Reward and Verification for Long-Sequence GUI Tasks","abstract":"Autonomous agents for long-sequence Graphical User Interface tasks are hindered by sparse rewards and the intractable credit assignment problem. To address these challenges, we introduce GUI-Shepherd, a Process Reward Model that provides dense, step-by-step feedback to guide agents. GUI-Shepherd is trained on a diverse large-scale data set of $52$k interactions that features human-annotated scores and GPT-4o generated rationales, enabling it to serve both as a reward provider for RL training and as a verifier for inference. As far as we know, we are the first to conduct a systematic study of process supervision in GUI agents, across diverse settings from online long-horizon tasks to offline single-step prediction. On the online AndroidWorld benchmark, GUI-Shepherd improves success rate by $7.7$ points via multi-turn online PPO, significantly outperforming Outcome Reward Model based competitors. When used as an inference verifier, it brings $5.1$ points improvements. The benefits generalize to the offline AndroidControl benchmark, with gains of $2.2$ points as a reward provider and $4.3$ points as a verifier. Collectively, our results establish that high-fidelity process supervision is critical for building more capable GUI agents and present a generalizable solution.","short_abstract":"Autonomous agents for long-sequence Graphical User Interface tasks are hindered by sparse rewards and the intractable credit assignment problem. To address these challenges, we introduce GUI-Shepherd, a Process Reward Model that provides dense, step-by-step feedback to guide agents. GUI-Shepherd is trained on a diverse...","url_abs":"https://arxiv.org/abs/2509.23738","url_pdf":"https://arxiv.org/pdf/2509.23738v1","authors":"[\"Cong Chen\",\"Kaixiang Ji\",\"Hao Zhong\",\"Muzhi Zhu\",\"Anzhou Li\",\"Guo Gan\",\"Ziyuan Huang\",\"Cheng Zou\",\"Jiajia Liu\",\"Jingdong Chen\",\"Hao Chen\",\"Chunhua Shen\"]","published":"2025-09-28T08:35:16Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[]","has_code":false}
