{"ID":2872964,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.07553","arxiv_id":"2509.07553","title":"VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents","abstract":"With the rapid progress of multimodal large language models, operating system (OS) agents become increasingly capable of automating tasks through on-device graphical user interfaces (GUIs). However, most existing OS agents are designed for idealized settings, whereas real-world environments often present untrustworthy conditions. To mitigate risks of over-execution in such scenarios, we propose a query-driven human-agent-GUI interaction framework that enables OS agents to decide when to query humans for more reliable task completion. Built upon this framework, we introduce VeriOS-Agent, a trustworthy OS agent trained with a three-stage learning paradigm that falicitate the decoupling and utilization of meta-knowledge by supervised fine-tuning and group relative policy optimization. Concretely, VeriOS-Agent autonomously executes actions in normal conditions while proactively querying humans in untrustworthy scenarios. Experiments show that VeriOS-Agent improves the average step-wise success rate by 19.72\\% in over the strongest baselines, without compromising normal performance. VeriOS-Agent significantly improves performance in untrustworthy scenarios while maintaining comparable performance in trustworthy scenarios. Analysis highlights VeriOS-Agent's rationality, generalizability, and scalability. The codes, datasets and models are available at https://github.com/Wuzheng02/VeriOS.","short_abstract":"With the rapid progress of multimodal large language models, operating system (OS) agents become increasingly capable of automating tasks through on-device graphical user interfaces (GUIs). However, most existing OS agents are designed for idealized settings, whereas real-world environments often present untrustworthy...","url_abs":"https://arxiv.org/abs/2509.07553","url_pdf":"https://arxiv.org/pdf/2509.07553v3","authors":"[\"Zheng Wu\",\"Heyuan Huang\",\"Xingyu Lou\",\"Xiangmou Qu\",\"Pengzhou Cheng\",\"Zongru Wu\",\"Weiwen Liu\",\"Weinan Zhang\",\"Jun Wang\",\"Zhaoxiang Wang\",\"Zhuosheng Zhang\"]","published":"2025-09-09T09:46:01Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":610013,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2872964,"paper_url":"https://arxiv.org/abs/2509.07553","paper_title":"VeriOS: Query-Driven Proactive Human-Agent-GUI Interaction for Trustworthy OS Agents","repo_url":"https://github.com/Wuzheng02/VeriOS","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
