{"ID":2881595,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.11918","arxiv_id":"2508.11918","title":"ExploreVLM: Closed-Loop Robot Exploration Task Planning with Vision-Language Models","abstract":"The advancement of embodied intelligence is accelerating the integration of robots into daily life as human assistants. This evolution requires robots to not only interpret high-level instructions and plan tasks but also perceive and adapt within dynamic environments. Vision-Language Models (VLMs) present a promising solution by combining visual understanding and language reasoning. However, existing VLM-based methods struggle with interactive exploration, accurate perception, and real-time plan adaptation. To address these challenges, we propose ExploreVLM, a novel closed-loop task planning framework powered by Vision-Language Models (VLMs). The framework is built around a step-wise feedback mechanism that enables real-time plan adjustment and supports interactive exploration. At its core is a dual-stage task planner with self-reflection, enhanced by an object-centric spatial relation graph that provides structured, language-grounded scene representations to guide perception and planning. An execution validator supports the closed loop by verifying each action and triggering re-planning. Extensive real-world experiments demonstrate that ExploreVLM significantly outperforms state-of-the-art baselines, particularly in exploration-centric tasks. Ablation studies further validate the critical role of the reflective planner and structured perception in achieving robust and efficient task execution.","short_abstract":"The advancement of embodied intelligence is accelerating the integration of robots into daily life as human assistants. This evolution requires robots to not only interpret high-level instructions and plan tasks but also perceive and adapt within dynamic environments. Vision-Language Models (VLMs) present a promising s...","url_abs":"https://arxiv.org/abs/2508.11918","url_pdf":"https://arxiv.org/pdf/2508.11918v1","authors":"[\"Zhichen Lou\",\"Kechun Xu\",\"Zhongxiang Zhou\",\"Rong Xiong\"]","published":"2025-08-16T05:42:48Z","proceeding":"cs.RO","tasks":"[\"cs.RO\"]","methods":"[\"Language Model\",\"LoRA\"]","has_code":false}