{"ID":2846224,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.02564","arxiv_id":"2511.02564","title":"Seeing Across Time and Views: Multi-Temporal Cross-View Learning for Robust Video Person Re-Identification","abstract":"Video-based person re-identification (ReID) in cross-view domains (for example, aerial-ground surveillance) remains an open problem because of extreme viewpoint shifts, scale disparities, and temporal inconsistencies. To address these challenges, we propose MTF-CVReID, a parameter-efficient framework that introduces seven complementary modules over a ViT-B/16 backbone. Specifically, we include: (1) Cross-Stream Feature Normalization (CSFN) to correct camera and view biases; (2) Multi-Resolution Feature Harmonization (MRFH) for scale stabilization across altitudes; (3) Identity-Aware Memory Module (IAMM) to reinforce persistent identity traits; (4) Temporal Dynamics Modeling (TDM) for motion-aware short-term temporal encoding; (5) Inter-View Feature Alignment (IVFA) for perspective-invariant representation alignment; (6) Hierarchical Temporal Pattern Learning (HTPL) to capture multi-scale temporal regularities; and (7) Multi-View Identity Consistency Learning (MVICL) that enforces cross-view identity coherence using a contrastive learning paradigm. Despite adding only about 2 million parameters and 0.7 GFLOPs over the baseline, MTF-CVReID maintains real-time efficiency (189 FPS) and achieves state-of-the-art performance on the AG-VPReID benchmark across all altitude levels, with strong cross-dataset generalization to G2A-VReID and MARS datasets. These results show that carefully designed adapter-based modules can substantially enhance cross-view robustness and temporal consistency without compromising computational efficiency. The source code is available at https://github.com/MdRashidunnabi/MTF-CVReID","short_abstract":"Video-based person re-identification (ReID) in cross-view domains (for example, aerial-ground surveillance) remains an open problem because of extreme viewpoint shifts, scale disparities, and temporal inconsistencies. To address these challenges, we propose MTF-CVReID, a parameter-efficient framework that introduces se...","url_abs":"https://arxiv.org/abs/2511.02564","url_pdf":"https://arxiv.org/pdf/2511.02564v1","authors":"[\"Md Rashidunnabi\",\"Kailash A. Hambarde\",\"Vasco Lopes\",\"Joao C. Neves\",\"Hugo Proenca\"]","published":"2025-11-04T13:37:59Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false,"code_links":[{"ID":607420,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2846224,"paper_url":"https://arxiv.org/abs/2511.02564","paper_title":"Seeing Across Time and Views: Multi-Temporal Cross-View Learning for Robust Video Person Re-Identification","repo_url":"https://github.com/MdRashidunnabi/MTF-CVReID","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}