{"ID":2867895,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18284","arxiv_id":"2509.18284","title":"Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction","abstract":"As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning to address real-world limitations such as modality imbalance and missingness. Our approach introduces learnable modality tokens for improving missingness-aware fusion of modalities and augments conventional unimodal contrastive objectives with fused multimodal representations. We validate our framework on large-scale clinical datasets for disease detection and prediction tasks, encompassing both visual and tabular modalities. Experimental results demonstrate that our method achieves state-of-the-art performance, particularly in challenging and practical scenarios where only a single modality is available. Furthermore, we show its adaptability through successful integration with a recent CT foundation model. Our findings highlight the effectiveness, efficiency, and generalizability of our approach for multimodal learning, offering a scalable, low-cost solution with significant potential for real-world clinical applications. The code is available at https://github.com/omron-sinicx/medical-modality-dropout.","short_abstract":"As medical diagnoses increasingly leverage multimodal data, machine learning models are expected to effectively fuse heterogeneous information while remaining robust to missing modalities. In this work, we propose a novel multimodal learning framework that integrates enhanced modalities dropout and contrastive learning...","url_abs":"https://arxiv.org/abs/2509.18284","url_pdf":"https://arxiv.org/pdf/2509.18284v1","authors":"[\"Yi Gu\",\"Kuniaki Saito\",\"Jiaxin Ma\"]","published":"2025-09-22T18:12:12Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":609523,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2867895,"paper_url":"https://arxiv.org/abs/2509.18284","paper_title":"Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction","repo_url":"https://github.com/omron-sinicx/medical-modality-dropout","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}