{"ID":2838220,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.17964","arxiv_id":"2511.17964","title":"X-ReID: Multi-granularity Information Interaction for Video-Based Visible-Infrared Person Re-Identification","abstract":"Large-scale vision-language models (e.g., CLIP) have recently achieved remarkable performance in retrieval tasks, yet their potential for Video-based Visible-Infrared Person Re-Identification (VVI-ReID) remains largely unexplored. The primary challenges are narrowing the modality gap and leveraging spatiotemporal information in video sequences. To address the above issues, in this paper, we propose a novel cross-modality feature learning framework named X-ReID for VVI-ReID. Specifically, we first propose a Cross-modality Prototype Collaboration (CPC) to align and integrate features from different modalities, guiding the network to reduce the modality discrepancy. Then, a Multi-granularity Information Interaction (MII) is designed, incorporating short-term interactions from adjacent frames, long-term cross-frame information fusion, and cross-modality feature alignment to enhance temporal modeling and further reduce modality gaps. Finally, by integrating multi-granularity information, a robust sequence-level representation is achieved. Extensive experiments on two large-scale VVI-ReID benchmarks (i.e., HITSZ-VCM and BUPTCampus) demonstrate the superiority of our method over state-of-the-art methods. The source code is released at https://github.com/AsuradaYuci/X-ReID.","short_abstract":"Large-scale vision-language models (e.g., CLIP) have recently achieved remarkable performance in retrieval tasks, yet their potential for Video-based Visible-Infrared Person Re-Identification (VVI-ReID) remains largely unexplored. The primary challenges are narrowing the modality gap and leveraging spatiotemporal infor...","url_abs":"https://arxiv.org/abs/2511.17964","url_pdf":"https://arxiv.org/pdf/2511.17964v2","authors":"[\"Chenyang Yu\",\"Xuehu Liu\",\"Pingping Zhang\",\"Huchuan Lu\"]","published":"2025-11-22T07:57:15Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false,"code_links":[{"ID":606749,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2838220,"paper_url":"https://arxiv.org/abs/2511.17964","paper_title":"X-ReID: Multi-granularity Information Interaction for Video-Based Visible-Infrared Person Re-Identification","repo_url":"https://github.com/AsuradaYuci/X-ReID","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}