{"ID":2898162,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04103","arxiv_id":"2507.04103","title":"How to Train Your LLM Web Agent: A Statistical Diagnosis","abstract":"LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents. To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT), followed by on-policy reinforcement learning. We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.","short_abstract":"LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions;...","url_abs":"https://arxiv.org/abs/2507.04103","url_pdf":"https://arxiv.org/pdf/2507.04103v4","authors":"[\"Dheeraj Vattikonda\",\"Santhoshi Ravichandran\",\"Emiliano Penaloza\",\"Hadi Nekoei\",\"Megh Thakkar\",\"Thibault Le Sellier de Chezelles\",\"Nicolas Gontier\",\"Miguel Muñoz-Mármol\",\"Sahar Omidi Shayegan\",\"Stefania Raimondo\",\"Xue Liu\",\"Alexandre Drouin\",\"Laurent Charlin\",\"Alexandre Piché\",\"Alexandre Lacoste\",\"Massimo Caccia\"]","published":"2025-07-05T17:12:33Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"cs.LG\",\"stat.ML\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false}
