{"ID":2872008,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09085","arxiv_id":"2509.09085","title":"IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection","abstract":"Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance. To address this, we propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. The proposed method adaptively enhances salient structures by fusing object-aware complementary cross-modal features while suppressing shared background interference. Our solution centers on two novel, specially designed modules: the Mutual Feature Refinement Module (MFRM) and the Differential Feature Feedback Module (DFFM). The MFRM enhances intra- and inter-modal feature representations by modeling their relationships, thereby improving cross-modal alignment and discriminative power. Inspired by feedback differential amplifiers, the DFFM dynamically computes inter-modal differential features as guidance signals and feeds them back to the MFRM, enabling adaptive fusion of complementary information while suppressing common-mode noise across modalities. To enable robust feature learning, the MFRM and DFFM are integrated into a unified framework, which is formally formulated as an Iterative Relation-Map Differential Guided Feature Fusion mechanism, termed IRDFusion. IRDFusion enables high-quality cross-modal fusion by progressively amplifying salient relational signals through iterative feedback, while suppressing feature noise, leading to significant performance gains. In extensive experiments on FLIR, LLVIP and M$^3$FD datasets, IRDFusion achieves state-of-the-art performance and consistently outperforms existing methods across diverse challenging scenarios, demonstrating its robustness and effectiveness. Code will be available at https://github.com/61s61min/IRDFusion.git.","short_abstract":"Current multispectral object detection methods often retain extraneous background or noise during feature fusion, limiting perceptual performance. To address this, we propose an innovative feature fusion framework based on cross-modal feature contrastive and screening strategy, diverging from conventional approaches. T...","url_abs":"https://arxiv.org/abs/2509.09085","url_pdf":"https://arxiv.org/pdf/2509.09085v2","authors":"[\"Jifeng Shen\",\"Haibo Zhan\",\"Xin Zuo\",\"Heng Fan\",\"Xiaohui Yuan\",\"Jun Li\",\"Wankou Yang\"]","published":"2025-09-11T01:22:35Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":609915,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2872008,"paper_url":"https://arxiv.org/abs/2509.09085","paper_title":"IRDFusion: Iterative Relation-Map Difference guided Feature Fusion for Multispectral Object Detection","repo_url":"https://github.com/61s61min/IRDFusion.git","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}