{"ID":2870988,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.12040","arxiv_id":"2509.12040","title":"Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing","abstract":"Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images. To bridge these gaps, we first establish a standardized OVRSIS benchmark (\\textbf{OVRSISBench}) based on widely-used RS segmentation datasets, enabling consistent evaluation across methods. Using this benchmark, we comprehensively evaluate several representative OVS/OVRSIS models and reveal their limitations when directly applied to remote sensing scenarios. Building on these insights, we propose \\textbf{RSKT-Seg}, a novel open-vocabulary segmentation framework tailored for remote sensing. RSKT-Seg integrates three key components: (1) a Multi-Directional Cost Map Aggregation (RS-CMA) module that captures rotation-invariant visual cues by computing vision-language cosine similarities across multiple directions; (2) an Efficient Cost Map Fusion (RS-Fusion) transformer, which jointly models spatial and semantic dependencies with a lightweight dimensionality reduction strategy; and (3) a Remote Sensing Knowledge Transfer (RS-Transfer) module that injects pre-trained knowledge and facilitates domain adaptation via enhanced upsampling. Extensive experiments on the benchmark show that RSKT-Seg consistently outperforms strong OVS baselines by +3.8 mIoU and +5.9 mACC, while achieving 2x faster inference through efficient aggregation. Our code is \\href{https://github.com/LiBingyu01/RSKT-Seg}{\\textcolor{blue}{here}}.","short_abstract":"Open-Vocabulary Remote Sensing Image Segmentation (OVRSIS), an emerging task that adapts Open-Vocabulary Segmentation (OVS) to the remote sensing (RS) domain, remains underexplored due to the absence of a unified evaluation benchmark and the domain gap between natural and RS images. To bridge these gaps, we first estab...","url_abs":"https://arxiv.org/abs/2509.12040","url_pdf":"https://arxiv.org/pdf/2509.12040v2","authors":"[\"Bingyu Li\",\"Haocheng Dong\",\"Da Zhang\",\"Zhiyuan Zhao\",\"Junyu Gao\",\"Xuelong Li\"]","published":"2025-09-15T15:24:49Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Transformer\"]","has_code":false,"code_links":[{"ID":609817,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2870988,"paper_url":"https://arxiv.org/abs/2509.12040","paper_title":"Exploring Efficient Open-Vocabulary Segmentation in the Remote Sensing","repo_url":"https://github.com/LiBingyu01/RSKT-Seg","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
