{"ID":2870722,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.11543","arxiv_id":"2509.11543","title":"UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning","abstract":"Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execution for lack of trajectory-level reward signals; online RL captures these signals through environment interaction, but suffers from sparse rewards and prohibitive deployment costs. To address it, we present Semi-online Reinforcement Learning, a novel paradigm that simulates online RL on offline trajectories. During each rollout process, we preserve the original model output within the multi-turn dialogue, where a Patch Module adaptively recovers the divergence between rollout and expert trajectories. To capture long-term training signals, Semi-online RL introduces discounted future returns into the reward computation and optimizes the policy with weighted step-level and episode-level advantages. We further introduce Semi-Online Performance (SOP), a metric that aligns better with true online performance, serving as a practical and effective proxy for real-world evaluation. Experiments show that ours Semi-online RL achieves SOTA performance among 7B models across four dynamic benchmarks, with significant gains over the base model (e.g., +12.0% on AndroidWorld, +23.8% on AITW), demonstrating significant progress in bridging the gap between offline training efficiency and online multi-turn reasoning. The code is available at https://github.com/X-PLUG/MobileAgent/tree/main/UI-S1.","short_abstract":"Graphical User Interface (GUI) agents have demonstrated remarkable progress in automating complex user interface interactions through reinforcement learning. However, current approaches face a fundamental dilemma: offline RL enables stable training on pre-collected trajectories, but struggles with multi-step task execu...","url_abs":"https://arxiv.org/abs/2509.11543","url_pdf":"https://arxiv.org/pdf/2509.11543v2","authors":"[\"Zhengxi Lu\",\"Jiabo Ye\",\"Fei Tang\",\"Yongliang Shen\",\"Haiyang Xu\",\"Ziwei Zheng\",\"Weiming Lu\",\"Ming Yan\",\"Fei Huang\",\"Jun Xiao\",\"Yueting Zhuang\"]","published":"2025-09-15T03:24:08Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":609788,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2870722,"paper_url":"https://arxiv.org/abs/2509.11543","paper_title":"UI-S1: Advancing GUI Automation via Semi-online Reinforcement Learning","repo_url":"https://github.com/X-PLUG/MobileAgent","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
