{"ID":520753,"CreatedAt":"2026-03-04T20:59:09Z","UpdatedAt":"2026-03-04T20:59:09Z","DeletedAt":null,"paper_url":"https://paperswithcode.com/paper/stone-needle-a-general-multimodal-large-scale","arxiv_id":"2306.16034","title":"Stone Needle: A General Multimodal Large-scale Model Framework towards Healthcare","abstract":"In healthcare, multimodal data is prevalent and requires to be comprehensively analyzed before diagnostic decisions, including medical images, clinical reports, etc. However, current large-scale artificial intelligence models predominantly focus on single-modal cognitive abilities and neglect the integration of multiple modalities. Therefore, we propose Stone Needle, a general multimodal large-scale model framework tailored explicitly for healthcare applications. Stone Needle serves as a comprehensive medical multimodal model foundation, integrating various modalities such as text, images, videos, and audio to surpass the limitations of single-modal systems. Through the framework components of intent analysis, medical foundation models, prompt manager, and medical language module, our architecture can perform multi-modal interaction in multiple rounds of dialogue. Our method is a general multimodal large-scale model framework, integrating diverse modalities and allowing us to tailor for specific tasks. The experimental results demonstrate the superior performance of our method compared to single-modal systems. The fusion of different modalities and the ability to process complex medical information in Stone Needle benefits accurate diagnosis, treatment recommendations, and patient care.","url_abs":"https://arxiv.org/abs/2306.16034v1","url_pdf":"https://arxiv.org/pdf/2306.16034v1.pdf","authors":"[\"Weihua Liu\", \"Yong Zuo\"]","published":"2023-06-28T00:00:00Z","tasks":"[\"Diagnostic\"]","methods":"[\"Focus\"]","has_code":false}
