{"ID":2840549,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.13259","arxiv_id":"2511.13259","title":"GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models","abstract":"Large multimodal models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks, however their knowledge and abilities in the cross-view geo-localization and pose estimation domains remain unexplored, despite potential benefits for navigation, autonomous driving, outdoor robotics, \\textit{etc}. To bridge this gap, we introduce \\textbf{GeoX-Bench}, a comprehensive \\underline{Bench}mark designed to explore and evaluate the capabilities of LMMs in \\underline{cross}-view \\underline{Geo}-localization and pose estimation. Specifically, GeoX-Bench contains 10,859 panoramic-satellite image pairs spanning 128 cities in 49 countries, along with corresponding 755,976 question-answering (QA) pairs. Among these, 42,900 QA pairs are designated for benchmarking, while the remaining are intended to enhance the capabilities of LMMs. Based on GeoX-Bench, we evaluate the capabilities of 25 state-of-the-art LMMs on cross-view geo-localization and pose estimation tasks, and further explore the empowered capabilities of instruction-tuning. Our benchmark demonstrate that while current LMMs achieve impressive performance in geo-localization tasks, their effectiveness declines significantly on the more complex pose estimation tasks, highlighting a critical area for future improvement, and instruction-tuning LMMs on the training data of GeoX-Bench can significantly improve the cross-view geo-sense abilities. The GeoX-Bench is available at \\textcolor{magenta}{https://github.com/IntMeGroup/GeoX-Bench}.","short_abstract":"Large multimodal models (LMMs) have demonstrated remarkable capabilities across a wide range of tasks, however their knowledge and abilities in the cross-view geo-localization and pose estimation domains remain unexplored, despite potential benefits for navigation, autonomous driving, outdoor robotics, \\textit{etc}. To...","url_abs":"https://arxiv.org/abs/2511.13259","url_pdf":"https://arxiv.org/pdf/2511.13259v1","authors":"[\"Yushuo Zheng\",\"Jiangyong Ying\",\"Huiyu Duan\",\"Chunyi Li\",\"Zicheng Zhang\",\"Jing Liu\",\"Xiaohong Liu\",\"Guangtao Zhai\"]","published":"2025-11-17T11:19:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[]","has_code":false,"code_links":[{"ID":606979,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2840549,"paper_url":"https://arxiv.org/abs/2511.13259","paper_title":"GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models","repo_url":"https://github.com/IntMeGroup/GeoX-Bench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
