{"ID":2897760,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.05520","arxiv_id":"2507.05520","title":"Architecting Clinical Collaboration: Multi-Agent Reasoning Systems for Multimodal Medical VQA","abstract":"Dermatological care via telemedicine often lacks the rich context of in-person visits. Clinicians must make diagnoses based on a handful of images and brief descriptions, without the benefit of physical exams, second opinions, or reference materials. While many medical AI systems attempt to bridge these gaps with domain-specific fine-tuning, this work hypothesized that mimicking clinical reasoning processes could offer a more effective path forward. This study tested seven vision-language models on medical visual question answering across six configurations: baseline models, fine-tuned variants, and both augmented with either reasoning layers that combine multiple model perspectives, analogous to peer consultation, or retrieval-augmented generation that incorporates medical literature at inference time, serving a role similar to reference-checking. While fine-tuning degraded performance in four of seven models with an average 30% decrease, baseline models collapsed on test data. Clinical-inspired architectures, meanwhile, achieved up to 70% accuracy, maintaining performance on unseen data while generating explainable, literature-grounded outputs critical for clinical adoption. These findings demonstrate that medical AI succeeds by reconstructing the collaborative and evidence-based practices fundamental to clinical diagnosis.","short_abstract":"Dermatological care via telemedicine often lacks the rich context of in-person visits. Clinicians must make diagnoses based on a handful of images and brief descriptions, without the benefit of physical exams, second opinions, or reference materials. While many medical AI systems attempt to bridge these gaps with domai...","url_abs":"https://arxiv.org/abs/2507.05520","url_pdf":"https://arxiv.org/pdf/2507.05520v3","authors":"[\"Karishma Thakrar\",\"Shreyas Basavatia\",\"Akshay Daftardar\"]","published":"2025-07-07T22:31:56Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"RAG\",\"Language Model\"]","has_code":false}
