{"ID":2859535,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.09664","arxiv_id":"2510.09664","title":"Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing","abstract":"Recently, deep supervised cross-modal hashing methods have achieve compelling success by learning semantic information in a self-supervised way. However, they still suffer from the key limitation that the multi-label semantic extraction process fail to explicitly interact with raw multimodal data, making the learned representation-level semantic information not compatible with the heterogeneous multimodal data and hindering the performance of bridging modality gap. To address this limitation, in this paper, we propose a novel semantic cohesive knowledge distillation scheme for deep cross-modal hashing, dubbed as SODA. Specifically, the multi-label information is introduced as a new textual modality and reformulated as a set of ground-truth label prompt, depicting the semantics presented in the image like the text modality. Then, a cross-modal teacher network is devised to effectively distill cross-modal semantic characteristics between image and label modalities and thus learn a well-mapped Hamming space for image modality. In a sense, such Hamming space can be regarded as a kind of prior knowledge to guide the learning of cross-modal student network and comprehensively preserve the semantic similarities between image and text modality. Extensive experiments on two benchmark datasets demonstrate the superiority of our model over the state-of-the-art methods.","short_abstract":"Recently, deep supervised cross-modal hashing methods have achieve compelling success by learning semantic information in a self-supervised way. However, they still suffer from the key limitation that the multi-label semantic extraction process fail to explicitly interact with raw multimodal data, making the learned re...","url_abs":"https://arxiv.org/abs/2510.09664","url_pdf":"https://arxiv.org/pdf/2510.09664v1","authors":"[\"Changchang Sun\",\"Vickie Chen\",\"Yan Yan\"]","published":"2025-10-07T18:07:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CV\",\"cs.IR\"]","methods":"[]","has_code":false}
