{"ID":2850077,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.22728","arxiv_id":"2510.22728","title":"S-Chain: Structured Visual Chain-of-Thought For Medicine","abstract":"Faithful reasoning in medical vision-language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has captured stepwise reasoning with precise visual grounding. We introduce S-Chain, the first large-scale dataset of 12,000 expert-annotated medical images with bounding boxes and structured visual CoT (SV-CoT), explicitly linking visual regions to reasoning steps. The dataset further supports 16 languages, totaling over 700k VQA pairs for broad multilingual applicability. Using S-Chain, we benchmark state-of-the-art medical VLMs (ExGra-Med, LLaVA-Med) and general-purpose VLMs (Qwen2.5-VL, InternVL2.5), showing that SV-CoT supervision significantly improves interpretability, grounding fidelity, and robustness. Beyond benchmarking, we study its synergy with retrieval-augmented generation, revealing how domain knowledge and visual grounding interact during autoregressive reasoning. Finally, we propose a new mechanism that strengthens the alignment between visual evidence and reasoning, improving both reliability and efficiency. S-Chain establishes a new benchmark for grounded medical reasoning and paves the way toward more trustworthy and explainable medical VLMs.","short_abstract":"Faithful reasoning in medical vision-language models (VLMs) requires not only accurate predictions but also transparent alignment between textual rationales and visual evidence. While Chain-of-Thought (CoT) prompting has shown promise in medical visual question answering (VQA), no large-scale expert-level dataset has c...","url_abs":"https://arxiv.org/abs/2510.22728","url_pdf":"https://arxiv.org/pdf/2510.22728v1","authors":"[\"Khai Le-Duc\",\"Duy M. H. Nguyen\",\"Phuong T. H. Trinh\",\"Tien-Phat Nguyen\",\"Nghiem T. Diep\",\"An Ngo\",\"Tung Vu\",\"Trinh Vuong\",\"Anh-Tien Nguyen\",\"Mau Nguyen\",\"Van Trung Hoang\",\"Khai-Nguyen Nguyen\",\"Hy Nguyen\",\"Chris Ngo\",\"Anji Liu\",\"Nhat Ho\",\"Anne-Christin Hauschild\",\"Khanh Xuan Nguyen\",\"Thanh Nguyen-Tang\",\"Pengtao Xie\",\"Daniel Sonntag\",\"James Zou\",\"Mathias Niepert\",\"Anh Totti Nguyen\"]","published":"2025-10-26T15:57:14Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false}