{"ID":2827552,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.16501","arxiv_id":"2512.16501","title":"VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks","abstract":"GUI grounding is a critical component in building capable GUI agents. However, existing grounding benchmarks suffer from significant limitations: they either provide insufficient data volume and narrow domain coverage, or focus excessively on a single platform and require highly specialized domain knowledge. In this work, we present VenusBench-GD, a comprehensive, bilingual benchmark for GUI grounding that spans multiple platforms, enabling hierarchical evaluation for real-word applications. VenusBench-GD contributes as follows: (i) we introduce a large-scale, cross-platform benchmark with extensive coverage of applications, diverse UI elements, and rich annotated data, (ii) we establish a high-quality data construction pipeline for grounding tasks, achieving higher annotation accuracy than existing benchmarks, and (iii) we extend the scope of element grounding by proposing a hierarchical task taxonomy that divides grounding into basic and advanced categories, encompassing six distinct subtasks designed to evaluate models from complementary perspectives. Our experimental findings reveal critical insights: general-purpose multimodal models now match or even surpass specialized GUI models on basic grounding tasks. In contrast, advanced tasks, still favor GUI-specialized models, though they exhibit significant overfitting and poor robustness. These results underscore the necessity of comprehensive, multi-tiered evaluation frameworks.","short_abstract":"GUI grounding is a critical component in building capable GUI agents. However, existing grounding benchmarks suffer from significant limitations: they either provide insufficient data volume and narrow domain coverage, or focus excessively on a single platform and require highly specialized domain knowledge. In this wo...","url_abs":"https://arxiv.org/abs/2512.16501","url_pdf":"https://arxiv.org/pdf/2512.16501v1","authors":"[\"Beitong Zhou\",\"Zhexiao Huang\",\"Yuan Guo\",\"Zhangxuan Gu\",\"Tianyu Xia\",\"Zichen Luo\",\"Fei Tang\",\"Dehan Kong\",\"Yanyi Shang\",\"Suling Ou\",\"Zhenlin Guo\",\"Changhua Meng\",\"Shuheng Shen\"]","published":"2025-12-18T13:09:09Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
