{"ID":3004905,"CreatedAt":"2026-06-03T03:09:48.883664427Z","UpdatedAt":"2026-06-05T10:38:01.117085634Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2606.03470","arxiv_id":"2606.03470","title":"Mixed-Modality Dual Face-Hair Retrieval","abstract":"We introduce Dual Face-Hair Retrieval (DFHR), a new mixed-modality dual-reference task in image retrieval where a query consists of a face image specifying identity and a hairstyle reference expressed as either an image or text. Unlike prior retrieval settings, DFHR requires cross-component reasoning between two semantically independent attributes -- identity and hairstyle -- originating from heterogeneous modalities. This formulation demands localized feature disentanglement, cross-modal semantic alignment, and mixed-modality composition within a unified embedding space. We construct DFHR-Bench, the first benchmark for mixed-modality face-hair retrieval, comprising over 180K annotated triplets across dual-image and image-text settings, built via a multi-stage annotation protocol ensuring semantic and identity integrity. We further propose MFHC (Multimodal Face-Hair Combiner), a unified framework that fuses disentangled identity and hairstyle embeddings through token injection and multi-view supervision. DFHR and DFHR-Bench together establish a new paradigm for identity-aware, attribute-controllable visual retrieval across modalities.","short_abstract":"We introduce Dual Face-Hair Retrieval (DFHR), a new mixed-modality dual-reference task in image retrieval where a query consists of a face image specifying identity and a hairstyle reference expressed as either an image or text. Unlike prior retrieval settings, DFHR requires cross-component reasoning between two semant...","url_abs":"https://arxiv.org/abs/2606.03470","url_pdf":"https://arxiv.org/pdf/2606.03470v1","authors":"[\"Quoc-Anh Bui-Huynh\",\"Mai-Tuyen Lam\",\"Dai-Anh-Tuan Nguyen\",\"Thanh Duc Ngo\"]","published":"2026-06-02T10:47:07Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
