{"ID":2827645,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.16755","arxiv_id":"2512.16755","title":"CitySeeker: How Do VLMS Explore Embodied Urban Navigation With Implicit Human Needs?","abstract":"Vision-Language Models (VLMs) have made significant progress in explicit instruction-based navigation; however, their ability to interpret implicit human needs (e.g., \"I am thirsty\") in dynamic urban environments remains underexplored. This paper introduces CitySeeker, a novel benchmark designed to assess VLMs' spatial reasoning and decision-making capabilities for exploring embodied urban navigation to address implicit needs. CitySeeker includes 6,440 trajectories across 8 cities, capturing diverse visual characteristics and implicit needs in 7 goal-driven scenarios. Extensive experiments reveal that even top-performing models (e.g., Qwen2.5-VL-32B-Instruct) achieve only 21.1% task completion. We find key bottlenecks in error accumulation in long-horizon reasoning, inadequate spatial cognition, and deficient experiential recall. To further analyze them, we investigate a series of exploratory strategies-Backtracking Mechanisms, Enriching Spatial Cognition, and Memory-Based Retrieval (BCR), inspired by human cognitive mapping's emphasis on iterative observation-reasoning cycles and adaptive path optimization. Our analysis provides actionable insights for developing VLMs with robust spatial intelligence required for tackling \"last-mile\" navigation challenges.","short_abstract":"Vision-Language Models (VLMs) have made significant progress in explicit instruction-based navigation; however, their ability to interpret implicit human needs (e.g., \"I am thirsty\") in dynamic urban environments remains underexplored. This paper introduces CitySeeker, a novel benchmark designed to assess VLMs' spatial...","url_abs":"https://arxiv.org/abs/2512.16755","url_pdf":"https://arxiv.org/pdf/2512.16755v1","authors":"[\"Siqi Wang\",\"Chao Liang\",\"Yunfan Gao\",\"Erxin Yu\",\"Sen Li\",\"Yushi Li\",\"Jing Li\",\"Haofen Wang\"]","published":"2025-12-18T16:53:12Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}
