{"ID":2862946,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.26539","arxiv_id":"2509.26539","title":"Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents","abstract":"Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Utilizing techniques optimized for developing small models, we build our 3B Ferret-UI Lite agent through curating a diverse GUI data mixture from real and synthetic sources, strengthening inference-time performance through chain-of-thought reasoning and visual tool-use, and reinforcement learning with designed rewards. Ferret-UI Lite achieves competitive performance with other small-scale GUI agents. In GUI grounding, Ferret-UI Lite attains scores of $91.6\\%$, $53.3\\%$, and $61.2\\%$ on the ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI navigation, Ferret-UI Lite achieves success rates of $28.0\\%$ on AndroidWorld and $19.8\\%$ on OSWorld. We share our methods and lessons learned from developing compact, on-device GUI agents.","short_abstract":"Developing autonomous agents that effectively interact with Graphic User Interfaces (GUIs) remains a challenging open problem, especially for small on-device models. In this paper, we present Ferret-UI Lite, a compact, end-to-end GUI agent that operates across diverse platforms, including mobile, web, and desktop. Util...","url_abs":"https://arxiv.org/abs/2509.26539","url_pdf":"https://arxiv.org/pdf/2509.26539v1","authors":"[\"Zhen Yang\",\"Zi-Yi Dou\",\"Di Feng\",\"Forrest Huang\",\"Anh Nguyen\",\"Keen You\",\"Omar Attia\",\"Yuhao Yang\",\"Michael Feng\",\"Haotian Zhang\",\"Ram Ramrakhya\",\"Chao Jia\",\"Jeffrey Nichols\",\"Alexander Toshev\",\"Yinfei Yang\",\"Zhe Gan\"]","published":"2025-09-30T17:13:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.CL\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
