{"ID":2866963,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18693","arxiv_id":"2509.18693","title":"MVT: Mask-Grounded Vision-Language Models for Taxonomy-Aligned Land-Cover Tagging","abstract":"Land-cover understanding in remote sensing increasingly demands class-agnostic systems that generalize across datasets while remaining spatially precise and interpretable. We study a geometry-first discovery-and-interpretation setting under domain shift, where candidate regions are delineated class-agnostically and supervision avoids lexical class names via anonymized identifiers. Complementary to open-set recognition and open-world learning, we focus on coupling class-agnostic mask evidence with taxonomy-grounded scene interpretation, rather than unknown rejection or continual class expansion. We propose MVT, a three-stage framework that (i) extracts boundary-faithful region masks using SAM2 with domain adaptation, (ii) performs mask-grounded semantic tagging and scene description generation via dual-step LoRA fine-tuning of multimodal LLMs, and (iii) evaluates outputs with LLM-as-judge scoring calibrated by stratified expert ratings. On cross-dataset segmentation transfer (train on OpenEarthMap, evaluate on LoveDA), domain-adapted SAM2 improves mask quality; meanwhile, dual-step MLLM fine-tuning yields more accurate taxonomy-aligned tags and more informative mask-grounded scene descriptions.","short_abstract":"Land-cover understanding in remote sensing increasingly demands class-agnostic systems that generalize across datasets while remaining spatially precise and interpretable. We study a geometry-first discovery-and-interpretation setting under domain shift, where candidate regions are delineated class-agnostically and sup...","url_abs":"https://arxiv.org/abs/2509.18693","url_pdf":"https://arxiv.org/pdf/2509.18693v3","authors":"[\"Siyi Chen\",\"Kai Wang\",\"Weicong Pang\",\"Ruiming Yang\",\"Ziru Chen\",\"Renjun Gao\",\"Alexis Kai Hon Lau\",\"Dasa Gu\",\"Chenchen Zhang\",\"Cheng Li\"]","published":"2025-09-23T06:23:56Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}