{"ID":2862381,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.01524","arxiv_id":"2510.01524","title":"WALT: Web Agents that Learn Tools","abstract":"Web agents promise to automate complex browser tasks, but current methods remain brittle -- relying on step-by-step UI interactions and heavy LLM reasoning that break under dynamic layouts and long horizons. Humans, by contrast, exploit website-provided functionality through high-level operations like search, filter, and sort. We introduce WALT (Web Agents that Learn Tools), a framework that reverse-engineers latent website functionality into reusable invocable tools. Rather than hypothesizing ad-hoc skills, WALT exposes robust implementations of automations already designed into websites -- spanning discovery (search, filter, sort), communication (post, comment, upvote), and content management (create, edit, delete). Tools abstract away low-level execution: instead of reasoning about how to click and type, agents simply call search(query) or create(listing). This shifts the computational burden from fragile step-by-step reasoning to reliable tool invocation. On VisualWebArena and WebArena, WALT achieves higher success with fewer steps and less LLM-dependent reasoning, establishing a robust and generalizable paradigm for browser automation.","short_abstract":"Web agents promise to automate complex browser tasks, but current methods remain brittle -- relying on step-by-step UI interactions and heavy LLM reasoning that break under dynamic layouts and long horizons. Humans, by contrast, exploit website-provided functionality through high-level operations like search, filter, a...","url_abs":"https://arxiv.org/abs/2510.01524","url_pdf":"https://arxiv.org/pdf/2510.01524v1","authors":"[\"Viraj Prabhu\",\"Yutong Dai\",\"Matthew Fernandez\",\"Jing Gu\",\"Krithika Ramakrishnan\",\"Yanqi Luo\",\"Silvio Savarese\",\"Caiming Xiong\",\"Junnan Li\",\"Zeyuan Chen\",\"Ran Xu\"]","published":"2025-10-01T23:41:47Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Large Language Model\"]","has_code":false}
