{"ID":3050118,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-06T10:22:36.014579446Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.04701","arxiv_id":"2606.04701","title":"Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms","abstract":"GUI agents today assume a static screen, where the world is frozen between two actions. However, real interfaces such as short-video applications violate this assumption, as their content keeps playing, and a competent user must decide what to watch and for how long. We formalize this task as Living-Screen-Native GUI agents and introduce LivingScreen, the first benchmark instantiating it on short-video platforms, with a faithful browser-based environment, a three-tier task suite, and metrics that jointly score accuracy and information efficiency. Evaluating extensive frontier models, we find that none reaches the human cost-accuracy performance, and that their dominant failure mode is over- and under-observation, pointing to observation control as a missing capability axis for future GUI agents. All data and code will be available at https://github.com/BITHLP/LivingScreen.","short_abstract":"GUI agents today assume a static screen, where the world is frozen between two actions. However, real interfaces such as short-video applications violate this assumption, as their content keeps playing, and a competent user must decide what to watch and for how long. We formalize this task as Living-Screen-Native GUI a...","url_abs":"https://arxiv.org/abs/2606.04701","url_pdf":"https://arxiv.org/pdf/2606.04701v1","authors":"[\"Jiashu Yao\",\"Heyan Huang\",\"Daiqing Wu\",\"Wangke Chen\",\"Huaxi Ai\",\"Haoyu Wen\",\"Zeming Liu\",\"Yuhang Guo\"]","published":"2026-06-03T10:25:46Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\"]","methods":"[]","has_code":false,"code_links":[{"ID":612784,"CreatedAt":"2026-06-04T02:13:16.786527022Z","UpdatedAt":"2026-06-04T02:13:16.786527022Z","DeletedAt":null,"paper_id":3050118,"paper_url":"https://arxiv.org/abs/2606.04701","paper_title":"Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms","repo_url":"https://github.com/BITHLP/LivingScreen","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
