{"ID":2837844,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.19768","arxiv_id":"2511.19768","title":"Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering","abstract":"Large vision-language models (VLMs) have improved embodied question answering (EQA) agents by providing strong semantic priors for open-vocabulary reasoning. However, when used directly for step-level exploration, VLMs often exhibit frontier oscillations, unstable back-and-forth movements caused by overconfidence and miscalibration, leading to inefficient navigation and degraded answer quality. We propose Prune-Then-Plan, a simple and effective framework that stabilizes exploration through step-level calibration. Instead of trusting raw VLM scores, our method prunes implausible frontier choices using a Holm-Bonferroni inspired pruning procedure and then delegates final decisions to a coverage-based planner. This separation converts overconfident predictions into conservative, interpretable actions by relying on human-level judgments to calibrate the step-level behavior of VLMs. Integrated into the 3D-Mem EQA framework, our approach achieves relative improvements of up to 49% and 33% in visually grounded SPL and LLM-Match metrics respectively over baselines. Overall, our method achieves better scene coverage under equal exploration budgets on both OpenEQA and EXPRESS-Bench datasets.","short_abstract":"Large vision-language models (VLMs) have improved embodied question answering (EQA) agents by providing strong semantic priors for open-vocabulary reasoning. However, when used directly for step-level exploration, VLMs often exhibit frontier oscillations, unstable back-and-forth movements caused by overconfidence and m...","url_abs":"https://arxiv.org/abs/2511.19768","url_pdf":"https://arxiv.org/pdf/2511.19768v1","authors":"[\"Noah Frahm\",\"Prakrut Patel\",\"Yue Zhang\",\"Shoubin Yu\",\"Mohit Bansal\",\"Roni Sengupta\"]","published":"2025-11-24T22:50:50Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\",\"cs.RO\"]","methods":"[\"Large Language Model\",\"Language Model\",\"LoRA\"]","has_code":false}