{"ID":2922131,"CreatedAt":"2026-06-02T02:42:49.606572591Z","UpdatedAt":"2026-06-02T15:47:14.09534485Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.00784","arxiv_id":"2606.00784","title":"DINO-GFSA: Geo-Localization via Semantic Gated Fusion and Mamba-based Sequential Aggregation","abstract":"Cross-view geo-localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self-positioning and target localization in GNSS-denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO-GFSA, a framework leveraging a LoRA (Low-Rank Adaptation) adapted DINOv3 (ViTL) backbone for parameter-efficient, high-capacity representation. Crucially, we introduce a Semantic Gated Residual Fusion module, which utilizes high-level semantics to selectively calibrate and integrate low-level spatial cues, effectively bridging the semantic gap. Furthermore, a Mamba-based Sequential Aggregation Head is designed to capture long-range spatial dependencies with linear complexity. Experiments demonstrate state-of-the-art performance on University-1652 and DenseUAV benchmarks, notably surpassing the previous best on DenseUAV by 3.48% on Recall@1. These results validate DINO-GFSA as a generalized, robust solution for UAV CVGL.","short_abstract":"Cross-view geo-localization (CVGL) is critical for Unmanned Aerial Vehicle (UAV) self-positioning and target localization in GNSS-denied environments. However, acquiring robust semantics while preserving finegrained spatial details remains challenging. To address this, we propose DINO-GFSA, a framework leveraging a LoR...","url_abs":"https://arxiv.org/abs/2606.00784","url_pdf":"https://arxiv.org/pdf/2606.00784v1","authors":"[\"Beier Hu\",\"Yuanshen Guo\",\"Jialu Cai\",\"Chengwei Li\",\"Yong Wang\",\"Shunan Wu\",\"Zhigang Wu\"]","published":"2026-05-30T16:03:39Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"LoRA\"]","has_code":false}