{"ID":2844248,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.06201","arxiv_id":"2511.06201","title":"Scene-Aware Urban Design: A Human-AI Recommendation Framework Using Co-Occurrence Embeddings and Vision-Language Models","abstract":"This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system detects urban objects and builds co-occurrence embeddings that reveal common spatial configurations. From this analysis, the user receives five statistically likely complements to a chosen anchor object. A vision language model then reasons over the scene image and the selected pair to suggest a third object that completes a more complex urban tactic. The workflow keeps people in control of selection and refinement and aims to move beyond top-down master planning by grounding choices in everyday patterns and lived experience.","short_abstract":"This paper introduces a human-in-the-loop computer vision framework that uses generative AI to propose micro-scale design interventions in public space and support more continuous, local participation. Using Grounding DINO and a curated subset of the ADE20K dataset as a proxy for the urban built environment, the system...","url_abs":"https://arxiv.org/abs/2511.06201","url_pdf":"https://arxiv.org/pdf/2511.06201v1","authors":"[\"Rodrigo Gallardo\",\"Oz Fishman\",\"Alexander Htet Kyaw\"]","published":"2025-11-09T03:24:10Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.HC\"]","methods":"[\"Language Model\"]","has_code":false}