{"ID":2879056,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.16882","arxiv_id":"2508.16882","title":"Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning","abstract":"Accurate segmentation of laryngo-pharyngeal tumors is crucial for precise diagnosis and effective treatment planning. However, traditional single-modality imaging methods often fall short of capturing the complex anatomical and pathological features of these tumors. In this study, we present an innovative multi-modality representation learning framework based on the `Align-Disentangle-Fusion' mechanism that seamlessly integrates 2D White Light Imaging (WLI) and Narrow Band Imaging (NBI) pairs to enhance segmentation performance. A cornerstone of our approach is multi-scale distribution alignment, which mitigates modality discrepancies by aligning features across multiple transformer layers. Furthermore, a progressive feature disentanglement strategy is developed with the designed preliminary disentanglement and disentangle-aware contrastive learning to effectively separate modality-specific and shared features, enabling robust multimodal contrastive learning and efficient semantic fusion. Comprehensive experiments on multiple datasets demonstrate that our method consistently outperforms state-of-the-art approaches, achieving superior accuracy across diverse real clinical scenarios.","short_abstract":"Accurate segmentation of laryngo-pharyngeal tumors is crucial for precise diagnosis and effective treatment planning. However, traditional single-modality imaging methods often fall short of capturing the complex anatomical and pathological features of these tumors. In this study, we present an innovative multi-modalit...","url_abs":"https://arxiv.org/abs/2508.16882","url_pdf":"https://arxiv.org/pdf/2508.16882v1","authors":"[\"Junhao Wu\",\"Yun Li\",\"Junhao Li\",\"Jingliang Bian\",\"Xiaomao Fan\",\"Wenbin Lei\",\"Ruxin Wang\"]","published":"2025-08-23T03:02:51Z","proceeding":"eess.IV","tasks":"[\"eess.IV\",\"cs.CV\"]","methods":"[\"Transformer\"]","has_code":false}