{"ID":2833442,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.03667","arxiv_id":"2512.03667","title":"Colon-X: Advancing Intelligent Colonoscopy toward Clinical Reasoning","abstract":"In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ever built for colonoscopy, featuring over 1.1M+ visual question answering entries across 76 clinical findings and 18 multimodal tasks. Beyond serving as a community-wide data foundation, we further investigate a critical yet underexplored transition in colonoscopy - evolving from multimodal understanding to clinical reasoning: (a) To capture the current landscape of multimodal understanding behaviors, we systematically assess the generalizability of 22 multimodal large language models and examine their reliability under human-induced perturbations. The results reveal that clinical outputs from leading MLLMs remain far from robust and trustworthy. (b) To narrow this gap, we further explore reasoning-centric intelligence tailored for colonoscopy. Specifically, we curate ColonReason, a clinically grounded reasoning dataset annotated through a multi-agent debating pipeline, and develop ColonR1, the first R1-styled model that mitigates reward information collapse through task-adaptive rewards and gradient-stable policy optimization. Under data-scarce conditions, our ColonR1 achieves 56.61% overall accuracy, outperforming supervised fine-tuning by 25.22%, and sets a new reasoning-enabled baseline for multimodal colonoscopy analysis. All data and model resources are publicly available at https://github.com/ai4colonoscopy/Colon-X.","short_abstract":"In this study, we present Colon-X, an open initiative aimed at advancing multimodal intelligence in colonoscopy. We begin by constructing ColonVQA, the most comprehensive multimodal dataset ever built for colonoscopy, featuring over 1.1M+ visual question answering entries across 76 clinical findings and 18 multimodal t...","url_abs":"https://arxiv.org/abs/2512.03667","url_pdf":"https://arxiv.org/pdf/2512.03667v2","authors":"[\"Ge-Peng Ji\",\"Jingyi Liu\",\"Deng-Ping Fan\",\"Huazhu Fu\",\"Nick Barnes\"]","published":"2025-12-03T10:55:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":606326,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2833442,"paper_url":"https://arxiv.org/abs/2512.03667","paper_title":"Colon-X: Advancing Intelligent Colonoscopy toward Clinical Reasoning","repo_url":"https://github.com/ai4colonoscopy/Colon-X","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
