{"ID":2862613,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25944","arxiv_id":"2509.25944","title":"NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving","abstract":"Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Model (VLM)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed to capture how risks evolve over time. To address this gap, we propose NuRisk, a comprehensive Visual Question Answering (VQA) dataset comprising 2.9K scenarios and 1.1M agent-level samples, built on real-world data from nuScenes and Waymo, completed with safety-critical scenarios from the CommonRoad simulator. The dataset provides Bird's-eye view (BEV) based sequential images with quantitative, agent-level risk annotations, enabling spatio-temporal reasoning. We benchmark well-known VLMs across different prompting techniques and find that they fail to perform explicit spatio-temporal reasoning, resulting in a peak accuracy of 33% at high latency. To address these shortcomings, our fine-tuned 7B VLM agent improves accuracy to 41% and reduces latency by 75%, demonstrating explicit spatio-temporal reasoning capabilities that proprietary models lacked. While this represents a significant step forward, the modest accuracy underscores the profound challenge of the task, establishing NuRisk as a critical benchmark for advancing spatio-temporal reasoning in autonomous driving. More information can be found at https://github.com/TUM-AVS/NuRisk.","short_abstract":"Understanding risk in autonomous driving requires not only perception and prediction, but also high-level reasoning about agent behavior and context. Current Vision Language Model (VLM)-based methods primarily ground agents in static images and provide qualitative judgments, lacking the spatio-temporal reasoning needed...","url_abs":"https://arxiv.org/abs/2509.25944","url_pdf":"https://arxiv.org/pdf/2509.25944v2","authors":"[\"Yuan Gao\",\"Mattia Piccinini\",\"Roberto Brusnicki\",\"Yuchen Zhang\",\"Johannes Betz\"]","published":"2025-09-30T08:37:31Z","proceeding":"cs.AI","tasks":"[\"cs.AI\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":608916,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2862613,"paper_url":"https://arxiv.org/abs/2509.25944","paper_title":"NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving","repo_url":"https://github.com/TUM-AVS/NuRisk","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}