{"ID":2848528,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.25387","arxiv_id":"2510.25387","title":"Instance-Level Composed Image Retrieval","abstract":"The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an instance-level class definition. The goal is to retrieve images that contain the same particular object as the visual query, presented under a variety of modifications defined by textual queries. Its design and curation process keep the dataset compact to facilitate future research, while maintaining its challenge-comparable to retrieval among more than 40M random distractors-through a semi-automated selection of hard negatives. To overcome the challenge of obtaining clean, diverse, and suitable training data, we leverage pre-trained vision-and-language models (VLMs) in a training-free approach called BASIC. The method separately estimates query-image-to-image and query-text-to-image similarities, performing late fusion to upweight images that satisfy both queries, while down-weighting those that exhibit high similarity with only one of the two. Each individual similarity is further improved by a set of components that are simple and intuitive. BASIC sets a new state of the art on i-CIR but also on existing CIR datasets that follow a semantic-level class definition. Project page: https://vrg.fel.cvut.cz/icir/.","short_abstract":"The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-quality training and evaluation data. We introduce a new evaluation dataset, i-CIR, which, unlike existing datasets, focuses on an insta...","url_abs":"https://arxiv.org/abs/2510.25387","url_pdf":"https://arxiv.org/pdf/2510.25387v2","authors":"[\"Bill Psomas\",\"George Retsinas\",\"Nikos Efthymiadis\",\"Panagiotis Filntisis\",\"Yannis Avrithis\",\"Petros Maragos\",\"Ondrej Chum\",\"Giorgos Tolias\"]","published":"2025-10-29T10:57:59Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[\"Language Model\"]","project_urls":"[\"https://vrg.fel.cvut.cz/icir/\"]","has_code":false,"code_links":[{"ID":607630,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_id":2848528,"paper_url":"https://arxiv.org/abs/2510.25387","paper_title":"Instance-Level Composed Image Retrieval","repo_url":"https://github.com/billpsomas/icir","is_official":false,"mentioned_in_paper":false,"mentioned_in_github":true,"github_stars":0}]}
