{"ID":2834822,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.00756","arxiv_id":"2512.00756","title":"MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents","abstract":"Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P\u0026R) capabilities are fundamental for GUI agents, current benchmarks lack fine-grained diagnostics to identify which specific capabilities lead to task failures, hindering targeted improvements; (2) existing benchmarks fail to provide a strictly aligned cross-lingual evaluation environment, introducing confounding factors that prevent isolating the language impact on GUI agent performance. To address these issues, we propose the Multilingual P\u0026R GUI Benchmark (MPR-GUI-Bench), featuring strictly aligned environments across six languages and eight fine-grained P\u0026R tasks. Our benchmark reveals consistent P\u0026R gaps between English and non-English settings, particularly on reasoning-intensive tasks. To leverage the superior English P\u0026R capabilities for bridging cross-lingual gaps, we identify layers sensitive to language and propose GUI-XLI, a GUI Cross-Lingual Intervention method that aligns non-English hidden states with their English counterparts at these layers during inference. Experiments show that GUI-XLI effectively reduces the cross-lingual gaps, with an average gain of 6.5% in non-English settings.","short_abstract":"Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P\u0026R) capabilities are fundamental for GUI agents, current ben...","url_abs":"https://arxiv.org/abs/2512.00756","url_pdf":"https://arxiv.org/pdf/2512.00756v2","authors":"[\"Ruihan Chen\",\"Qiming Li\",\"Xiaocheng Feng\",\"Weihong Zhong\",\"Xiaoliang Yang\",\"Yuxuan Gu\",\"Zekun Zhou\",\"Yunfei Lu\",\"Haoyu Ren\",\"Kun Chen\",\"Dandan Tu\",\"Bing Qin\"]","published":"2025-11-30T06:47:33Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
