WEBSERV: A Full-Stack and RL-Ready Web Environment for Training Web Agents at Scale

cs.LG arXiv:2510.16252
View PDF arXiv JSON

Abstract

Reinforcement learning (RL) for web agents demands environments that are both effective for evaluation and efficient enough for large-scale on-policy training. Current web environments fall short: server-side Docker setups are too resource-intensive for massive parallel rollouts, while browser-side interfaces produce noisy observations, execute actions unreliably under modern single-page applications, and omit visual interactivity cues. We introduce WebServ, a full-stack, RL-ready web environment that addresses these limitations end-to-end. On the server side, WebServ uses Incus containers with block-level copy-on-write, reducing launch latency by ~5x and persistent storage by ~240x, enabling 200+ concurrent isolated environments on a single host. On the browser side, WebServ provides a compact, site-agnostic observation and action interface derived automatically from the DOM with human-aligned interactivity cues, and a robust action execution backend using network-aware waiting for reliable SPA support. On WebArena-Lite, WebServ achieves state-of-the-art single-prompt results, with controlled comparisons confirming consistent gains across GPT-4o, OpenAI-o3, and Llama-3.1-8B over vanilla WebArena. We further train Qwen3-4B and Qwen3-30B-A3B with RL entirely within WebServ; the RL-trained 4B model achieves 55.5% mean accuracy, surpassing both Claude 4.5 Sonnet (50.0%) and the RL-trained 8B model from WebAgent-R1 (51.8%).

PDF Viewer