{"ID":2842377,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.14788","arxiv_id":"2511.14788","title":"Subnational Geocoding of Global Disasters Using Large Language Models","abstract":"Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully automated LLM-assisted workflow that processes and cleans textual location information using GPT-4o, and assigns geometries by cross-checking three independent geoinformation repositories: GADM, OpenStreetMap and Wikidata. Based on the agreement and availability of these sources, we assign a reliability score to each location while generating subnational geometries. Applied to the EM-DAT dataset from 2000 to 2024, the workflow geocodes 14,215 events across 17,948 unique locations. Unlike previous methods, our approach requires no manual intervention, covers all disaster types, enables cross-verification across multiple sources, and allows flexible remapping to preferred frameworks. Beyond the dataset, we demonstrate the potential of LLMs to extract and structure geographic information from unstructured text, offering a scalable and reliable method for related analyses.","short_abstract":"Subnational location data of disaster events are critical for risk assessment and disaster risk reduction. Disaster databases such as EM-DAT often report locations in unstructured textual form, with inconsistent granularity or spelling, that make it difficult to integrate with spatial datasets. We present a fully autom...","url_abs":"https://arxiv.org/abs/2511.14788","url_pdf":"https://arxiv.org/pdf/2511.14788v1","authors":"[\"Michele Ronco\",\"Damien Delforge\",\"Wiebke S. Jäger\",\"Christina Corbane\"]","published":"2025-11-13T17:04:18Z","proceeding":"cs.AI","tasks":"[\"cs.AI\",\"stat.AP\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false}
