{"ID":2835873,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.22253","arxiv_id":"2511.22253","title":"UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries","abstract":"Image-Guided Retrieval with Optional Text (IGROT) is a general retrieval setting where a query consists of an anchor image, with or without accompanying text, aiming to retrieve semantically relevant target images. This formulation unifies two major tasks: Composed Image Retrieval (CIR) and Sketch-Based Image Retrieval (SBIR). In this work, we address IGROT under low-data supervision by introducing UNION, a lightweight and generalisable target representation that fuses the image embedding with a null-text prompt. Unlike traditional approaches that rely on fixed target features, UNION enhances semantic alignment with multimodal queries while requiring no architectural modifications to pretrained vision-language models. With only 5,000 training samples - from LlavaSCo for CIR and Training-Sketchy for SBIR - our method achieves competitive results across benchmarks, including CIRCO mAP@50 of 38.5 and Sketchy mAP@200 of 82.7, surpassing many heavily supervised baselines. This demonstrates the robustness and efficiency of UNION in bridging vision and language across diverse query types.","short_abstract":"Image-Guided Retrieval with Optional Text (IGROT) is a general retrieval setting where a query consists of an anchor image, with or without accompanying text, aiming to retrieve semantically relevant target images. This formulation unifies two major tasks: Composed Image Retrieval (CIR) and Sketch-Based Image Retrieval...","url_abs":"https://arxiv.org/abs/2511.22253","url_pdf":"https://arxiv.org/pdf/2511.22253v1","authors":"[\"Hoang-Bao Le\",\"Allie Tran\",\"Binh T. Nguyen\",\"Liting Zhou\",\"Cathal Gurrin\"]","published":"2025-11-27T09:28:28Z","proceeding":"cs.IR","tasks":"[\"cs.IR\",\"cs.CV\"]","methods":"[\"Language Model\"]","has_code":false}
