{"ID":2845055,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2601.06031","arxiv_id":"2601.06031","title":"Beyond Clicking:A Step Towards Generalist GUI Grounding via Text Dragging","abstract":"Graphical user interface (GUI) grounding, the process of mapping human instructions to GUI actions, serves as a fundamental basis to autonomous GUI agents. While existing grounding models achieve promising performance to simulate the mouse click action on various click-based benchmarks, another essential mode of mouse interaction, namely dragging, remains largely underexplored. Yet, dragging the mouse to select and manipulate textual content represents a prevalent and important usage in practical GUI scenarios. To narrow this gap, we first introduce GUI-Drag, a diverse dataset of 161K text dragging examples synthesized through a scalable pipeline. To support systematic and robust evaluation, we further construct ScreenDrag, a benchmark with 5,333 examples spanning three levels of interface context, together with three dedicated metrics designed for assessing text dragging capability. Models trained on GUI-Drag with an efficient continual training strategy achieve substantial improvements on ScreenDrag, while preserving the original click-based performance on ScreenSpot, ScreenSpot-v2, and OSWorld-G. Our work encourages further research on broader GUI grounding beyond just clicking and paves way toward a truly generalist GUI grounding model. All benchmark, data, checkpoints, and code are open-sourced and available at https://osu-nlp-group.github.io/GUI-Drag.","short_abstract":"Graphical user interface (GUI) grounding, the process of mapping human instructions to GUI actions, serves as a fundamental basis to autonomous GUI agents. While existing grounding models achieve promising performance to simulate the mouse click action on various click-based benchmarks, another essential mode of mouse...","url_abs":"https://arxiv.org/abs/2601.06031","url_pdf":"https://arxiv.org/pdf/2601.06031v1","authors":"[\"Zeyi Liao\",\"Yadong Lu\",\"Boyu Gou\",\"Huan Sun\",\"Ahmed Awadallah\"]","published":"2025-11-07T19:40:09Z","proceeding":"cs.HC","tasks":"[\"cs.HC\",\"cs.AI\"]","methods":"[]","has_code":false}
