{"ID":2835859,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22235","arxiv_id":"2511.22235","title":"Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation","abstract":"The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent issues of responsibility coupling and capability conflicts. Second, agents lack awareness of the task state, leading to progress loss in long-horizon tasks. To address these challenges, we propose a staged execution-feedback reinforcement learning algorithm. Unlike training a unified policy model, we focus on training high-level scheduling models. Specifically, we propose and train two agents: a Coordinator, responsible for the strategic planning and task decomposition; and a State Tracker, responsible for context compression and information management to maintain the task's state and coherence. Based on this, we built the Coordinator-Executor-State Tracker (CES) multi-agent framework, which can be integrated with any low-level Executor model, assisting the Executor in solving long-horizon tasks through task scheduling and state management. Experiments on long-horizon task benchmarks demonstrate that CES significantly enhances the system's planning and state management capabilities. Furthermore, analysis confirms that our trained high-level scheduling module is a generalizable, plug-and-play module that significantly enhances the long-horizon capabilities of various Executors. Code can be available at https://github.com/hehehahi4/CES.","short_abstract":"The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon tasks. First, single-agent models struggle to balance high-level capabilities and low-level execution capability, facing prevalent iss...","url_abs":"https://arxiv.org/abs/2511.22235","url_pdf":"https://arxiv.org/pdf/2511.22235v2","authors":"[\"Zehao Deng\",\"Tianjie Ju\",\"Zheng Wu\",\"Zhuosheng Zhang\",\"Gongshen Liu\"]","published":"2025-11-27T09:01:38Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":606548,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2835859,"paper_url":"https://arxiv.org/abs/2511.22235","paper_title":"Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation","repo_url":"https://github.com/hehehahi4/CES","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
