{"ID":2887348,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01778","arxiv_id":"2508.01778","title":"DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion","abstract":"Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural detail but become unstable without precise maps. To harness the complementary strengths of both, we propose DiffSemanticFusion -- a fusion framework for multimodal trajectory prediction and planning. Our approach reasons over a semantic raster-fused BEV space, enhanced by a map diffusion module that improves both the stability and expressiveness of online HD map representations. We validate our framework on two downstream tasks: trajectory prediction and planning-oriented end-to-end autonomous driving. Experiments on real-world autonomous driving benchmarks, nuScenes and NAVSIM, demonstrate improved performance over several state-of-the-art methods. For the prediction task on nuScenes, we integrate DiffSemanticFusion with the online HD map informed QCNet, achieving a 5.1\\% performance improvement. For end-to-end autonomous driving in NAVSIM, DiffSemanticFusion achieves state-of-the-art results, with a 15\\% performance gain in NavHard scenarios. In addition, extensive ablation and sensitivity studies show that our map diffusion module can be seamlessly integrated into other vector-based approaches to enhance performance. All artifacts are available at https://github.com/SunZhigang7/DiffSemanticFusion.","short_abstract":"Autonomous driving requires accurate scene understanding, including road geometry, traffic agents, and their semantic relationships. In online HD map generation scenarios, raster-based representations are well-suited to vision models but lack geometric precision, while graph-based representations retain structural deta...","url_abs":"https://arxiv.org/abs/2508.01778","url_pdf":"https://arxiv.org/pdf/2508.01778v1","authors":"[\"Zhigang Sun\",\"Yiru Wang\",\"Anqing Jiang\",\"Shuo Wang\",\"Yu Gao\",\"Yuwen Heng\",\"Shouyi Zhang\",\"An He\",\"Hao Jiang\",\"Jinhao Chai\",\"Zichong Gu\",\"Wang Jijun\",\"Shichen Tang\",\"Lavdim Halilaj\",\"Juergen Luettin\",\"Hao Sun\"]","published":"2025-08-03T14:32:05Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.RO\"]","methods":"[\"Diffusion Model\",\"Generative Adversarial Network\"]","has_code":false,"code_links":[{"ID":611426,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2887348,"paper_url":"https://arxiv.org/abs/2508.01778","paper_title":"DiffSemanticFusion: Semantic Raster BEV Fusion for Autonomous Driving via Online HD Map Diffusion","repo_url":"https://github.com/SunZhigang7/DiffSemanticFusion","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
