{"ID":2838221,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17965","arxiv_id":"2511.17965","title":"Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification","abstract":"Multi-modal object Re-IDentification (ReID) is devoted to retrieving specific objects through the exploitation of complementary multi-modal image information. Existing methods mainly concentrate on the fusion of multi-modal features, yet neglecting the background interference. Besides, current multi-modal fusion methods often focus on aligning modality pairs but suffer from multi-modal consistency alignment. To address these issues, we propose a novel selective interaction and global-local alignment framework called Signal for multi-modal object ReID. Specifically, we first propose a Selective Interaction Module (SIM) to select important patch tokens with intra-modal and inter-modal information. These important patch tokens engage in the interaction with class tokens, thereby yielding more discriminative features. Then, we propose a Global Alignment Module (GAM) to simultaneously align multi-modal features by minimizing the volume of 3D polyhedra in the gramian space. Meanwhile, we propose a Local Alignment Module (LAM) to align local features in a shift-aware manner. With these modules, our proposed framework could extract more discriminative features for object ReID. Extensive experiments on three multi-modal object ReID benchmarks (i.e., RGBNT201, RGBNT100, MSVR310) validate the effectiveness of our method. The source code is available at https://github.com/010129/Signal.","short_abstract":"Multi-modal object Re-IDentification (ReID) is devoted to retrieving specific objects through the exploitation of complementary multi-modal image information. Existing methods mainly concentrate on the fusion of multi-modal features, yet neglecting the background interference. Besides, current multi-modal fusion method...","url_abs":"https://arxiv.org/abs/2511.17965","url_pdf":"https://arxiv.org/pdf/2511.17965v1","authors":"[\"Yangyang Liu\",\"Yuhao Wang\",\"Pingping Zhang\"]","published":"2025-11-22T07:58:46Z","proceeding":"cs.CV","tasks":"[\"cs.CV\",\"cs.MM\"]","methods":"[]","has_code":false,"code_links":[{"ID":606750,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2838221,"paper_url":"https://arxiv.org/abs/2511.17965","paper_title":"Signal: Selective Interaction and Global-local Alignment for Multi-Modal Object Re-Identification","repo_url":"https://github.com/010129/Signal","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
