{"ID":2826899,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.17213","arxiv_id":"2512.17213","title":"CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency","abstract":"Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offer a low-cost alignment solution, their reliance on sparse, outcome-based rewards inadvertently encourages models to \"overthink\" -- generating verbose, convoluted, and unverifiable Chain-of-Thought reasoning to justify answers. This focus on outcomes obscures factual errors and poses significant safety risks. To address this, we propose CheXPO-v2, a novel alignment framework that shifts from outcome to process supervision. Our core innovation is a Knowledge Graph Consistency Reward mechanism driven by Entity-Relation Matching. By explicitly parsing reasoning steps into structured \"Disease, Relation, Anatomy\" triplets, we provide fine-grained supervision that penalizes incoherent logic and hallucinations at the atomic level. Integrating this with a hard-example mining strategy, our approach significantly outperforms GRPO and state-of-the-art models on benchmarks like MIMIC-CXR-VQA. Crucially, CheXPO-v2 achieves new state-of-the-art accuracy using only 5k samples, demonstrating exceptional data efficiency while producing clinically sound and verifiable reasoning. The project source code is publicly available at: https://github.com/ecoxial2007/CheX-Phi4MM.","short_abstract":"Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offer a low-cost alignment solution, their reliance on sparse, outcome-based rewards inadvertently encourages models to \"overthink\" --...","url_abs":"https://arxiv.org/abs/2512.17213","url_pdf":"https://arxiv.org/pdf/2512.17213v1","authors":"[\"Xiao Liang\",\"Yuxuan An\",\"Di Wang\",\"Jiawei Hu\",\"Zhicheng Jiao\",\"Bin Jing\",\"Quan Wang\"]","published":"2025-12-19T03:50:42Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":605770,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2826899,"paper_url":"https://arxiv.org/abs/2512.17213","paper_title":"CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency","repo_url":"https://github.com/ecoxial2007/CheX-Phi4MM","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}