{"ID":2879956,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.15746","arxiv_id":"2508.15746","title":"End-to-End Agentic RAG System Training for Traceable Diagnostic Reasoning","abstract":"The integration of Large Language Models (LLMs) into healthcare is constrained by knowledge limitations, hallucinations, and a disconnect from Evidence-Based Medicine (EBM). While Retrieval-Augmented Generation (RAG) offers a solution, current systems often rely on static workflows that miss the iterative, hypothetico-deductive reasoning of clinicians. To address this, we introduce Deep-DxSearch, an agentic RAG system trained end-to-end via reinforcement learning (RL) for traceable diagnostic reasoning. Deep-DxSearch acts as an active investigator, treating the LLM as an agent within an environment of 16,000+ guideline-derived disease profiles, 150,000+ patient records for case-based reasoning, and over 27 million biomedical documents. Using soft verifiable rewards that co-optimize retrieval and reasoning, the model learns to formulate queries, evaluate evidence, and refine searches to close diagnostic gaps. Experiments show our end-to-end RL framework consistently outperforms prompt-engineering and training-free RAG methods. On in-distribution (ID) and out-of-distribution (OOD) benchmarks for common and rare diseases, Deep-DxSearch surpasses strong baselines-including GPT-4o, DeepSeek-R1, and medical-specific frameworks-achieving an average accuracy gain of 22.7% over the second-best model. In validation with 150 real-world cases, Deep-DxSearch boosts physicians' average diagnostic accuracy from 45.6% to 69.1%. These results indicate that evolving agentic systems to leverage statistical regularities in large-scale healthcare data is key for trustworthy diagnostic assistants. All data, code, and checkpoints are available at https://qiaoyu-zheng.github.io/Deep-DxSearch.","short_abstract":"The integration of Large Language Models (LLMs) into healthcare is constrained by knowledge limitations, hallucinations, and a disconnect from Evidence-Based Medicine (EBM). While Retrieval-Augmented Generation (RAG) offers a solution, current systems often rely on static workflows that miss the iterative, hypothetico-...","url_abs":"https://arxiv.org/abs/2508.15746","url_pdf":"https://arxiv.org/pdf/2508.15746v2","authors":"[\"Qiaoyu Zheng\",\"Yuze Sun\",\"Chaoyi Wu\",\"Weike Zhao\",\"Pengcheng Qiu\",\"Yongguo Yu\",\"Kun Sun\",\"Jian Zhang\",\"Yanfeng Wang\",\"Ya Zhang\",\"Weidi Xie\"]","published":"2025-08-21T17:42:47Z","proceeding":"cs.CL","tasks":"[\"cs.CL\",\"cs.AI\",\"cs.CV\"]","methods":"[\"RAG\",\"Reinforcement Learning\",\"Large Language Model\",\"Language Model\"]","has_code":false}