Automated Wildfire Damage Assessment from Multi view Ground level Imagery Via Vision Language Models

Sep 2, 2025 cs.CV arXiv:2509.01895

Abstract

The escalating intensity and frequency of wildfires demand innovative computational methods for rapid and accurate property damage assessment. Traditional methods are often time-consuming, while modern computer vision approaches typically require extensive labeled datasets, hindering immediate post-disaster deployment. This research introduces a novel, zero-shot framework leveraging pre-trained multimodal large language models (MLLMs) to classify damage from ground-level imagery. Using Generative Pre-trained Transformer 4o (GPT-4o) as the primary model with comparative validation against Qwen2.5-Vision-Language-32-Billion-Instruct (Qwen), the research evaluates two pipelines applied to the 2025 Eaton and Palisades fires in California. These pipelines include an end-to-end inference method (Pipeline A) and a decoupled workflow where visual cues drive text-based classification (Pipeline B). A primary contribution of this study is demonstrating the efficacy of MLLMs in synthesizing information from multiple perspectives. The findings show that while single-view assessments struggle to classify intermediate damage, a multi-view analysis yields dramatic improvements. To explore the impact of prompting methods, the research benchmarked a baseline zero-shot and heuristic approach against advance reasoning strategies (Structured-Chain-of-Thought and Self-Consistency). The results indicate that simple prompting methods achieve a comparable accuracy to the reasoning strategies.

Abstract

PDF Viewer