{"ID":2863998,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.00072","arxiv_id":"2510.00072","title":"Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards","abstract":"Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: indirect verifiable rewards, derived from seemingly unrelated metadata, are sufficient to induce sophisticated and generalizable geospatial reasoning across a wide range of downstream tasks (25+). We present Geo-R1 as one empirical instantiation of this paradigm. Rather than relying on limited task-specific annotations (i.e., direct rewards), Geo-R1 utilizes scalable, verifiable indirect proxy rewards based on cross-view alignment with metadata (geolocation information) to drive reinforcement learning at scale. Such indirect rewards successfully motivate the model to discover and internalize zero-shot geospatial reasoning across diverse tasks, achieving extraordinary zero-shot transfer on out-of-distribution benchmarks and even surpassing fully supervised specialists on certain benchmarks. These findings indicate that optimizing for indirect verifiable rewards may provide a scalable pathway to unlock generalized reasoning capabilities in rare domains with massive unlabeled data archives. Our code is availavle at: https://github.com/miniHuiHui/Geo-R1.","short_abstract":"Training robust reasoning vision-language models (VLMs) in rare domains (such as geospatial) is fundamentally constrained by supervision scarcity. While raw geospatial imagery is abundant, the amount of task-direct supervision falls far behind that of common domains. In this work, we validate an important conclusion: i...","url_abs":"https://arxiv.org/abs/2510.00072","url_pdf":"https://arxiv.org/pdf/2510.00072v2","authors":"[\"Chenhui Xu\",\"Fuxun Yu\",\"Michael J. Bianco\",\"Jacob Kovarskiy\",\"Raphael Tang\",\"Qi Zhang\",\"Zirui Xu\",\"Will LeVine\",\"Brandon Dubbs\",\"Heming Liao\",\"Cassandra Burgess\",\"Suvam Bag\",\"Jay Patravali\",\"Rupanjali Kukal\",\"Mikael Figueroa\",\"Rishi Madhok\",\"Nikolaos Karianakis\",\"Jinjun Xiong\"]","published":"2025-09-29T21:34:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.LG\"]","methods":"[\"Reinforcement Learning\",\"Language Model\"]","has_code":false,"code_links":[{"ID":609093,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2863998,"paper_url":"https://arxiv.org/abs/2510.00072","paper_title":"Unlocking Zero-Shot Geospatial Reasoning via Indirect Rewards","repo_url":"https://github.com/miniHuiHui/Geo-R1","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
