{"ID":2856901,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.10802","arxiv_id":"2510.10802","title":"MSCloudCAM: Multi-Scale Context Adaptation with Convolutional Cross-Attention for Multispectral Cloud Segmentation","abstract":"Clouds remain a major obstacle in optical satellite imaging, limiting accurate environmental and climate analysis. To address the strong spectral variability and the large scale differences among cloud types, we propose MSCloudCAM, a novel multi-scale context adapter network with convolution based cross-attention tailored for multispectral and multi-sensor cloud segmentation. A key contribution of MSCloudCAM is the explicit modeling of multiple complementary multi-scale context extractors. And also, rather than simply stacking or concatenating their outputs, our formulation uses one extractor's fine-resolution features and the other extractor's global contextual representations enabling dynamic, scale-aware feature selection. Building on this idea, we design a new convolution-based cross attention adapter that effectively fuses localized, detailed information with broader multi-scale context. Integrated with a hierarchical vision backbone and refined through channel and spatial attention mechanisms, MSCloudCAM achieves strong spectral-spatial discrimination. Experiments on various multisensor datatsets e.g. CloudSEN12 (Sentinel-2) and L8Biome (Landsat-8), demonstrate that MSCloudCAM achieves superior overall segmentation performance and competitive class-wise accuracy compared to recent state-of-the-art models, while maintaining competitive model complexity, highlighting the novelty and effectiveness of the proposed design for large-scale Earth observation.","short_abstract":"Clouds remain a major obstacle in optical satellite imaging, limiting accurate environmental and climate analysis. To address the strong spectral variability and the large scale differences among cloud types, we propose MSCloudCAM, a novel multi-scale context adapter network with convolution based cross-attention tailo...","url_abs":"https://arxiv.org/abs/2510.10802","url_pdf":"https://arxiv.org/pdf/2510.10802v4","authors":"[\"Md Abdullah Al Mazid\",\"Liangdong Deng\",\"Naphtali Rishe\"]","published":"2025-10-12T20:40:22Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[]","has_code":false}