{"ID":2841045,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12676","arxiv_id":"2511.12676","title":"BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections","abstract":"Deploying embodied agents that can answer questions about their surroundings in realistic real-world settings remains difficult, partly due to the scarcity of benchmarks for episodic memory Embodied Question Answering (EQA). Inspired by the challenges of infrastructure inspections, we propose Inspection EQA as a compelling problem class for advancing episodic memory EQA. It demands multi-scale reasoning and long-range spatial understanding, while offering standardized evaluation, professional inspection reports as grounding, and egocentric imagery. We introduce BridgeEQA, a benchmark of 2,200 open-vocabulary question-answer pairs (in the style of OpenEQA) grounded in professional inspection reports across 200 real-world bridge scenes with 47.93 images on average per scene. We further propose a new EQA metric Image Citation Relevance to evaluate the ability of a model to cite relevant images. Evaluations of state-of-the-art vision-language models reveal substantial performance gaps. To address this, we propose Embodied Memory Visual Reasoning (EMVR), which formulates the inspection EQA task as a Markov decision process. EMVR shows strong performance over the baselines. Code and dataset are available at https://drags99.github.io/bridge-eqa/","short_abstract":"Deploying embodied agents that can answer questions about their surroundings in realistic real-world settings remains difficult, partly due to the scarcity of benchmarks for episodic memory Embodied Question Answering (EQA). Inspired by the challenges of infrastructure inspections, we propose Inspection EQA as a compel...","url_abs":"https://arxiv.org/abs/2511.12676","url_pdf":"https://arxiv.org/pdf/2511.12676v2","authors":"[\"Subin Varghese\",\"Joshua Gao\",\"Asad Ur Rahman\",\"Vedhus Hoskere\"]","published":"2025-11-16T16:30:38Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false}
