{"ID":2866509,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.19980","arxiv_id":"2509.19980","title":"RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis","abstract":"Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose Retrieval-Augmented Diagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.","short_abstract":"Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on...","url_abs":"https://arxiv.org/abs/2509.19980","url_pdf":"https://arxiv.org/pdf/2509.19980v2","authors":"[\"Haolin Li\",\"Tianjie Dai\",\"Zhe Chen\",\"Siyuan Du\",\"Jiangchao Yao\",\"Ya Zhang\",\"Yanfeng Wang\"]","published":"2025-09-24T10:36:14Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":609379,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2866509,"paper_url":"https://arxiv.org/abs/2509.19980","paper_title":"RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis","repo_url":"https://github.com/tdlhl/RAD","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}