{"ID":2871905,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.10446","arxiv_id":"2509.10446","title":"DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL","abstract":"Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supervised data. To address these challenges, we present DeepDive to advance deep search agents. First, we propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs. Second, we apply end-to-end multi-turn reinforcement learning (RL) to enhance LLMs' long-horizon reasoning with deep search. To encourage diversity and reduce redundancy, we design a redundancy penalty that discourages repeated similar queries. Experiments show that DeepDive-32B achieves a new open-source competitive result on BrowseComp, outperforming WebSailor, DeepSeek-R1-Browse, and Search-o1. We demonstrate that multi-turn RL training improves deep search ability and significantly contributes to the performance improvements across multiple benchmarks. We observe that DeepDive enables test-time scaling of tool calls and parallel sampling. All datasets, models, and code are publicly available at https://github.com/THUDM/DeepDive.","short_abstract":"Augmenting large language models (LLMs) with browsing tools substantially improves their potential as deep search agents to solve complex, real-world tasks. Yet, open LLMs still perform poorly in such settings due to limited long-horizon reasoning capacity with browsing tools and the lack of sufficiently difficult supe...","url_abs":"https://arxiv.org/abs/2509.10446","url_pdf":"https://arxiv.org/pdf/2509.10446v2","authors":"[\"Rui Lu\",\"Zhenyu Hou\",\"Zihan Wang\",\"Hanchen Zhang\",\"Xiao Liu\",\"Yujiang Li\",\"Shi Feng\",\"Jie Tang\",\"Yuxiao Dong\"]","published":"2025-09-12T17:52:35Z","proceeding":"cs.CL","tasks":"[\"cs.CL\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609905,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2871905,"paper_url":"https://arxiv.org/abs/2509.10446","paper_title":"DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL","repo_url":"https://github.com/THUDM/DeepDive","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
