{"ID":3083697,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-07T05:32:54.120957816Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.06217","arxiv_id":"2606.06217","title":"DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments","abstract":"When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low-altitude UAV views and under tight on-site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/description), cover limited disaster types, and provide insufficient support for the multi-stage reasoning required in practical emergency response. We introduce DisasterBench, a multi-stage multimodal reasoning benchmark for UAV-Based disaster response in complex environments. DisasterBench spans 14 disaster-related scene types and 9 response-critical tasks across pre-, during-, and post-disaster stages, with fine-grained disaster-task mappings that explicitly test causal attribution, propagation prediction, damage analysis, and decision-oriented reasoning. To enable reasoning on the edge, we further propose DisasterVL, a lightweight multimodal model optimized with a three-stage pipeline combining domain instruction tuning, chain-of-thought-guided multimodal alignment, and reinforcement learning-based policy optimization. Experiments across 21 popular MLLMs show that our 2B-parameter DisasterVL outperforms all evaluated open-source models and substantially narrows the gap to state-of-the-art closed-source models, achieving GPT-4o-comparable reasoning accuracy with superior efficiency. The project page is available at https://github.com/TanmouTT/DisasterBench.","short_abstract":"When a disaster unfolds, responders must answer not only what is happening, but also why it is happening, what will happen next, and what to do now, often from noisy low-altitude UAV views and under tight on-site compute constraints. However, most existing multimodal benchmarks emphasize perception (e.g., recognition/d...","url_abs":"https://arxiv.org/abs/2606.06217","url_pdf":"https://arxiv.org/pdf/2606.06217v1","authors":"[\"Tan Zhang\",\"Quanyou Li\",\"Lu Zhang\",\"Jun Liu\",\"Xiaofeng Zhu\",\"Ping Hu\"]","published":"2026-06-04T14:31:11Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\",\"Large Language Model\"]","has_code":false,"code_links":[{"ID":612824,"CreatedAt":"2026-06-05T06:46:15.197025399Z","UpdatedAt":"2026-06-05T06:46:15.197025399Z","DeletedAt":null,"paper_id":3083697,"paper_url":"https://arxiv.org/abs/2606.06217","paper_title":"DisasterBench: A Multimodal Benchmark for UAV-Based Disaster Response in Complex Environments","repo_url":"https://github.com/TanmouTT/DisasterBench","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}