{"ID":2870051,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.14427","arxiv_id":"2509.14427","title":"Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models","abstract":"Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful pretrained encoders that produce rich pretrained embeddings. We revisit classical, training-free hashing techniques: principal component analysis, random orthogonal projection, and threshold binarization, to produce a strong baseline for hashing. Our approach combines these techniques with frozen embeddings from state-of-the-art vision and audio encoders to yield competitive retrieval performance without any additional learning or fine-tuning. To demonstrate the generality and effectiveness of this approach, we evaluate it on standard image retrieval benchmarks as well as a newly introduced benchmark for audio hashing.","short_abstract":"Information retrieval with compact binary embeddings, also referred to as hashing, is crucial for scalable fast search applications, yet state-of-the-art hashing methods require expensive, scenario-specific training. In this work, we introduce Hashing-Baseline, a strong training-free hashing method leveraging powerful...","url_abs":"https://arxiv.org/abs/2509.14427","url_pdf":"https://arxiv.org/pdf/2509.14427v2","authors":"[\"Ilyass Moummad\",\"Kawtar Zaher\",\"Lukas Rauch\",\"Alexis Joly\"]","published":"2025-09-17T20:58:43Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.IR\"]","methods":"[]","has_code":false}
