{"ID":2852349,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.18798","arxiv_id":"2510.18798","title":"WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection","abstract":"Search agents have achieved significant advancements in enabling intelligent information retrieval and decision-making within interactive environments. Although reinforcement learning has been employed to train agentic models capable of more dynamic interactive retrieval, existing methods are limited by shallow tool-use depth and the accumulation of errors over multiple iterative interactions. In this paper, we present WebSeer, a more intelligent search agent trained via reinforcement learning enhanced with a self-reflection mechanism. Specifically, we construct a large dataset annotated with reflection patterns and design a two-stage training framework that unifies cold start and reinforcement learning within the self-reflection paradigm for real-world web-based environments, which enables the model to generate longer and more reflective tool-use trajectories. Our approach substantially extends tool-use chains and improves answer accuracy. Using a single 14B model, we achieve state-of-the-art results on HotpotQA and SimpleQA, with accuracies of 72.3% and 90.0%, respectively, and demonstrate strong generalization to out-of-distribution datasets. The code is available at https://github.com/99hgz/WebSeer","short_abstract":"Search agents have achieved significant advancements in enabling intelligent information retrieval and decision-making within interactive environments. Although reinforcement learning has been employed to train agentic models capable of more dynamic interactive retrieval, existing methods are limited by shallow tool-us...","url_abs":"https://arxiv.org/abs/2510.18798","url_pdf":"https://arxiv.org/pdf/2510.18798v1","authors":"[\"Guanzhong He\",\"Zhen Yang\",\"Jinxin Liu\",\"Bin Xu\",\"Lei Hou\",\"Juanzi Li\"]","published":"2025-10-21T16:52:00Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\"]","has_code":false,"code_links":[{"ID":607986,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2852349,"paper_url":"https://arxiv.org/abs/2510.18798","paper_title":"WebSeer: Training Deeper Search Agents through Reinforcement Learning with Self-Reflection","repo_url":"https://github.com/99hgz/WebSeer","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
