{"ID":2834271,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.01210","arxiv_id":"2512.01210","title":"Knowledge Graph Augmented Large Language Models for Disease Prediction","abstract":"Electronic health records (EHRs) enable strong clinical prediction, but explanations are often coarse and hard to use for patient-level decisions. We propose a knowledge graph (KG)-guided chain-of-thought (CoT) framework for visit-level disease prediction on MIMIC-III. We map ICD-9 codes to PrimeKG, mine disease-relevant nodes and paths, and use these paths to scaffold temporally consistent CoT rationales, retaining only samples whose conclusions match observed outcomes. We fine-tune lightweight instruction-tuned LLMs (LLaMA-3.1-Instruct-8B and Gemma-7B) on two small cohorts (400 and 1,000 index visits) across ten PrimeKG-mapped diseases. Our models outperform strong classical baselines, reaching AUROC 0.66-0.70 and macro-AUPR 0.40-0.47. Without additional training, the models transfer zero-shot to the CRADLE cohort, improving accuracy from 0.40-0.51 to 0.72-0.77. In a blinded clinician study, KG-guided CoT rationales are consistently preferred for clarity, relevance, and correctness. Code is available at: https://github.com/JonathanWry/KG-guided-LLM-pipeline","short_abstract":"Electronic health records (EHRs) enable strong clinical prediction, but explanations are often coarse and hard to use for patient-level decisions. We propose a knowledge graph (KG)-guided chain-of-thought (CoT) framework for visit-level disease prediction on MIMIC-III. We map ICD-9 codes to PrimeKG, mine disease-releva...","url_abs":"https://arxiv.org/abs/2512.01210","url_pdf":"https://arxiv.org/pdf/2512.01210v3","authors":"[\"Ruiyu Wang\",\"Tuan Vinh\",\"Ran Xu\",\"Yuyin Zhou\",\"Jiaying Lu\",\"Carl Yang\",\"Francisco Pasquel\"]","published":"2025-12-01T02:49:17Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":606395,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2834271,"paper_url":"https://arxiv.org/abs/2512.01210","paper_title":"Knowledge Graph Augmented Large Language Models for Disease Prediction","repo_url":"https://github.com/JonathanWry/KG-guided-LLM-pipeline","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
