{"ID":2880819,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.14040","arxiv_id":"2508.14040","title":"ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents","abstract":"We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments. Scaling end-to-end RL training is crucial for improvement and generalization across diverse desktop tasks; however, it remains challenging due to environmental inefficiency and instability during extended training. To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale online RL. Furthermore, we propose Entropulse, a training strategy that alternates reinforcement learning with supervised fine-tuning, effectively mitigating entropy collapse during extended training runs. We employ ComputerRL on open models GLM-4-9B-0414 and GLM-4.1V-9B-Thinking, and evaluate them on the OSWorld benchmark. The AutoGLM-OS-9B achieves a new state-of-the-art accuracy of 48.9%, demonstrating significant improvements for general agents in desktop automation. Our code and the new OfficeWorld benchmark are available at https://github.com/thudm/ComputerRL. The algorithm and framework are adopted in building AutoGLM (Liu et al., 2024b).","short_abstract":"We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centr...","url_abs":"https://arxiv.org/abs/2508.14040","url_pdf":"https://arxiv.org/pdf/2508.14040v2","authors":"[\"Hanyu Lai\",\"Xiao Liu\",\"Yanxiao Zhao\",\"Han Xu\",\"Hanchen Zhang\",\"Bohao Jing\",\"Yanyu Ren\",\"Shuntian Yao\",\"Yuxiao Dong\",\"Jie Tang\"]","published":"2025-08-19T17:59:45Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":610731,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2880819,"paper_url":"https://arxiv.org/abs/2508.14040","paper_title":"ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents","repo_url":"https://github.com/thudm/ComputerRL","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}