{"ID":2897354,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.04678","arxiv_id":"2507.04678","title":"ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing","abstract":"Spatiotemporal image generation is a highly meaningful task, which can generate future scenes conditioned on given observations. However, existing change generation methods can only handle event-driven changes (e.g., new buildings) and fail to model cross-temporal variations (e.g., seasonal shifts). In this work, we propose ChangeBridge, a conditional spatiotemporal image generation model for remote sensing. Given pre-event images and multimodal event controls, ChangeBridge generates post-event scenes that are both spatially and temporally coherent. The core idea is a drift-asynchronous diffusion bridge. Specifically, it consists of three main modules: a) Composed Bridge Initialization, which replaces noise initialization. It starts the diffusion from a composed pre-event state, modeling a diffusion bridge process. b) Asynchronous Drift Diffusion, which uses a pixel-wise drift map, assigning different drift magnitudes to event and temporal evolution. This enables differentiated generation during the pre-to-post transition. c) Drift-Aware Denoising, which embeds the drift map into the denoising network, guiding drift-aware reconstruction. Experiments show that ChangeBridge can generate better cross-spatiotemporal aligned scenarios compared to state-of-the-art methods. Additionally, ChangeBridge shows great potential for land-use planning and as a data generation engine for a series of change detection tasks. Code is available at https://github.com/zhenghuizhao/ChangeBridge","short_abstract":"Spatiotemporal image generation is a highly meaningful task, which can generate future scenes conditioned on given observations. However, existing change generation methods can only handle event-driven changes (e.g., new buildings) and fail to model cross-temporal variations (e.g., seasonal shifts). In this work, we pr...","url_abs":"https://arxiv.org/abs/2507.04678","url_pdf":"https://arxiv.org/pdf/2507.04678v3","authors":"[\"Zhenghui Zhao\",\"Chen Wu\",\"Xiangyong Cao\",\"Di Wang\",\"Hongruixuan Chen\",\"Datao Tang\",\"Liangpei Zhang\",\"Zhuo Zheng\"]","published":"2025-07-07T05:51:55Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Diffusion Model\"]","has_code":false,"code_links":[{"ID":612333,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2897354,"paper_url":"https://arxiv.org/abs/2507.04678","paper_title":"ChangeBridge: Spatiotemporal Image Generation with Multimodal Controls for Remote Sensing","repo_url":"https://github.com/zhenghuizhao/ChangeBridge","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}