{"ID":2892791,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.14544","arxiv_id":"2507.14544","title":"Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025","abstract":"This paper describes our approach to Subtask 1 of the ImageCLEFmed MEDVQA 2025 Challenge, which targets visual question answering (VQA) for gastrointestinal endoscopy. We adopt the Florence model-a large-scale multimodal foundation model-as the backbone of our VQA pipeline, pairing a powerful vision encoder with a text encoder to interpret endoscopic images and produce clinically relevant answers. To improve generalization, we apply domain-specific augmentations that preserve medical features while increasing training diversity. Experiments on the KASVIR dataset show that fine-tuning Florence yields accurate responses on the official challenge metrics. Our results highlight the potential of large multimodal models in medical VQA and provide a strong baseline for future work on explainability, robustness, and clinical integration. The code is publicly available at: https://github.com/TiwariLaxuu/VQA-Florence.git","short_abstract":"This paper describes our approach to Subtask 1 of the ImageCLEFmed MEDVQA 2025 Challenge, which targets visual question answering (VQA) for gastrointestinal endoscopy. We adopt the Florence model-a large-scale multimodal foundation model-as the backbone of our VQA pipeline, pairing a powerful vision encoder with a text...","url_abs":"https://arxiv.org/abs/2507.14544","url_pdf":"https://arxiv.org/pdf/2507.14544v1","authors":"[\"Sujata Gaihre\",\"Amir Thapa Magar\",\"Prasuna Pokharel\",\"Laxmi Tiwari\"]","published":"2025-07-19T09:04:13Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":612017,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2892791,"paper_url":"https://arxiv.org/abs/2507.14544","paper_title":"Multimodal AI for Gastrointestinal Diagnostics: Tackling VQA in MEDVQA-GI 2025","repo_url":"https://github.com/TiwariLaxuu/VQA-Florence.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
