{"ID":2873692,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.05895","arxiv_id":"2509.05895","title":"BTCChat: Advancing Remote Sensing Bi-temporal Change Captioning with Multimodal Large Language Model","abstract":"Bi-temporal satellite imagery supports critical applications such as urbanization monitoring and disaster assessment. Although powerful multimodal large language models~(MLLMs) have been applied in bi-temporal change analysis, previous methods process image pairs through direct concatenation, inadequately modeling temporal correlations and spatial semantic changes. This deficiency hampers visual-semantic alignment in change understanding, thereby constraining the overall effectiveness of current approaches. To address this gap, we propose BTCChat, a multi-temporal MLLM with advanced bi-temporal change understanding capability. BTCChat supports bi-temporal change captioning and retains single-image interpretation capability. To better capture temporal features and spatial semantic changes in image pairs, we design a Change Extraction module. Moreover, to enhance the model's attention to spatial details, we introduce a Prompt Augmentation mechanism, which incorporates contextual clues into the prompt to enhance model performance. Experimental results demonstrate that BTCChat achieves state-of-the-art performance on change captioning and visual question answering tasks. The code is available \\href{https://github.com/IntelliSensing/BTCChat}{here}.","short_abstract":"Bi-temporal satellite imagery supports critical applications such as urbanization monitoring and disaster assessment. Although powerful multimodal large language models~(MLLMs) have been applied in bi-temporal change analysis, previous methods process image pairs through direct concatenation, inadequately modeling temp...","url_abs":"https://arxiv.org/abs/2509.05895","url_pdf":"https://arxiv.org/pdf/2509.05895v2","authors":"[\"Yujie Li\",\"Wenjia Xu\",\"Yuanben Zhang\",\"Zhiwei Wei\",\"Mugen Peng\"]","published":"2025-09-07T02:16:18Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Language Model\"]","has_code":false,"code_links":[{"ID":610081,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2873692,"paper_url":"https://arxiv.org/abs/2509.05895","paper_title":"BTCChat: Advancing Remote Sensing Bi-temporal Change Captioning with Multimodal Large Language Model","repo_url":"https://github.com/IntelliSensing/BTCChat","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}