{"ID":2863628,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.24815","arxiv_id":"2509.24815","title":"Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets","abstract":"Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.","short_abstract":"Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection...","url_abs":"https://arxiv.org/abs/2509.24815","url_pdf":"https://arxiv.org/pdf/2509.24815v1","authors":"[\"Sebastian Bruch\",\"Franco Maria Nardini\",\"Cosimo Rulli\",\"Rossano Venturini\"]","published":"2025-09-29T14:02:45Z","proceeding":"cs.DS","tasks":"[\"cs.DS\",\"cs.IR\",\"cs.LG\"]","methods":"[\"Generative Adversarial Network\"]","has_code":false}
