{"ID":2873388,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.06566","arxiv_id":"2509.06566","title":"Back To The Drawing Board: Rethinking Scene-Level Sketch-Based Image Retrieval","abstract":"The goal of Scene-level Sketch-Based Image Retrieval is to retrieve natural images matching the overall semantics and spatial layout of a free-hand sketch. Unlike prior work focused on architectural augmentations of retrieval models, we emphasize the inherent ambiguity and noise present in real-world sketches. This insight motivates a training objective that is explicitly designed to be robust to sketch variability. We show that with an appropriate combination of pre-training, encoder architecture, and loss formulation, it is possible to achieve state-of-the-art performance without the introduction of additional complexity. Extensive experiments on a challenging FS-COCO and widely-used SketchyCOCO datasets confirm the effectiveness of our approach and underline the critical role of training design in cross-modal retrieval tasks, as well as the need to improve the evaluation scenarios of scene-level SBIR.","short_abstract":"The goal of Scene-level Sketch-Based Image Retrieval is to retrieve natural images matching the overall semantics and spatial layout of a free-hand sketch. Unlike prior work focused on architectural augmentations of retrieval models, we emphasize the inherent ambiguity and noise present in real-world sketches. This ins...","url_abs":"https://arxiv.org/abs/2509.06566","url_pdf":"https://arxiv.org/pdf/2509.06566v1","authors":"[\"Emil Demić\",\"Luka Čehovin Zajc\"]","published":"2025-09-08T11:26:40Z","proceeding":"cs.CV","tasks":"[\"cs.CV\"]","methods":"[]","has_code":false}
