{"ID":2868860,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16017","arxiv_id":"2509.16017","title":"DistillMatch: Leveraging Knowledge Distillation from Vision Foundation Model for Multimodal Image Matching","abstract":"Multimodal image matching seeks pixel-level correspondences between images of different modalities, crucial for cross-modal perception, fusion and analysis. However, the significant appearance differences between modalities make this task challenging. Due to the scarcity of high-quality annotated datasets, existing deep learning methods that extract modality-common features for matching perform poorly and lack adaptability to diverse scenarios. Vision Foundation Model (VFM), trained on large-scale data, yields generalizable and robust feature representations adapted to data and tasks of various modalities, including multimodal matching. Thus, we propose DistillMatch, a multimodal image matching method using knowledge distillation from VFM. DistillMatch employs knowledge distillation to build a lightweight student model that extracts high-level semantic features from VFM (including DINOv2 and DINOv3) to assist matching across modalities. To retain modality-specific information, it extracts and injects modality category information into the other modality's features, which enhances the model's understanding of cross-modal correlations. Furthermore, we design V2I-GAN to boost the model's generalization by translating visible to pseudo-infrared images for data augmentation. Experiments show that DistillMatch outperforms existing algorithms on public datasets.","short_abstract":"Multimodal image matching seeks pixel-level correspondences between images of different modalities, crucial for cross-modal perception, fusion and analysis. However, the significant appearance differences between modalities make this task challenging. Due to the scarcity of high-quality annotated datasets, existing dee...","url_abs":"https://arxiv.org/abs/2509.16017","url_pdf":"https://arxiv.org/pdf/2509.16017v1","authors":"[\"Meng Yang\",\"Fan Fan\",\"Zizhuo Li\",\"Songchu Deng\",\"Yong Ma\",\"Jiayi Ma\"]","published":"2025-09-19T14:26:25Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Large Language Model\",\"Generative Adversarial Network\"]","has_code":false}