{"ID":2825558,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.21302","arxiv_id":"2512.21302","title":"AndroidLens: Long-latency Evaluation with Nested Sub-targets for Android GUI Agents","abstract":"Graphical user interface (GUI) agents can substantially improve productivity by automating frequently executed long-latency tasks on mobile devices. However, existing evaluation benchmarks are still constrained to limited applications, simple tasks, and coarse-grained metrics. To address this, we introduce AndroidLens, a challenging evaluation framework for mobile GUI agents, comprising 571 long-latency tasks in both Chinese and English environments, each requiring an average of more than 26 steps to complete. The framework features: (1) tasks derived from real-world user scenarios across 38 domains, covering complex types such as multi-constraint, multi-goal, and domain-specific tasks; (2) static evaluation that preserves real-world anomalies and allows multiple valid paths to reduce bias; and (3) dynamic evaluation that employs a milestone-based scheme for fine-grained progress measurement via Average Task Progress (ATP). Our evaluation indicates that even the best models reach only a 12.7% task success rate and 50.47% ATP. We also underscore key challenges in real-world environments, including environmental anomalies, adaptive exploration, and long-term memory retention.","short_abstract":"Graphical user interface (GUI) agents can substantially improve productivity by automating frequently executed long-latency tasks on mobile devices. However, existing evaluation benchmarks are still constrained to limited applications, simple tasks, and coarse-grained metrics. To address this, we introduce AndroidLens,...","url_abs":"https://arxiv.org/abs/2512.21302","url_pdf":"https://arxiv.org/pdf/2512.21302v1","authors":"[\"Yue Cao\",\"Yingyao Wang\",\"Pi Bu\",\"Jingxuan Xing\",\"Wei Jiang\",\"Zekun Zhu\",\"Junpeng Ma\",\"Sashuai Zhou\",\"Tong Lu\",\"Jun Song\",\"Yu Cheng\",\"Yuning Jiang\",\"Bo Zheng\"]","published":"2025-12-24T17:40:42Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"LoRA\"]","has_code":false}
