{"ID":2888159,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.01108","arxiv_id":"2508.01108","title":"Random-Access Ranked Retrieval and Similarity Search","abstract":"We extend Random Access, a fundamental operation that enables efficient search and exploration algorithms, to the modern interactive data systems based on Ranked Retrieval and Similarity Search, where orderings are dynamically defined over a high-dimensional feature space. This extension enables efficient solutions for a wide range of applications, from data analytics tools and database systems to recommendation systems and machine learning. We formalize the Random-Access Ranked Retrieval (RAR) problem, and extend it to Similarity Search. Our algorithmic innovations include the development of a theoretically efficient algorithm based on geometric arrangements, achieving logarithmic query time. However, this method suffers from exponential space complexity in high dimensions. Therefore, we develop a second class of algorithms based on $\\varepsilon$-sampling, which consume a linear space. Since exactly locating the tuple at a specific rank is challenging due to its connection to the range counting problem, we introduce a relaxed variant called $κ$-Random-Access Ranked Retrieval, which returns a small subset of size $κ$ guaranteed to contain the target tuple. To solve this problem efficiently, we define an intermediate problem, Stripe Range Retrieval (SRR), and design a hierarchical sampling data structure tailored for narrow stripe range queries. Our method achieves practical scalability in both data size and dimensionality. We prove near-optimal bounds on the efficiency of our algorithms and validate their performance through extensive experiments on real and synthetic datasets, demonstrating scalability to millions of tuples and hundreds of dimensions.","short_abstract":"We extend Random Access, a fundamental operation that enables efficient search and exploration algorithms, to the modern interactive data systems based on Ranked Retrieval and Similarity Search, where orderings are dynamically defined over a high-dimensional feature space. This extension enables efficient solutions for...","url_abs":"https://arxiv.org/abs/2508.01108","url_pdf":"https://arxiv.org/pdf/2508.01108v2","authors":"[\"Mohsen Dehghankar\",\"Abolfazl Asudeh\",\"Raghav Mittal\",\"Suraj Shetiya\",\"Gautam Das\"]","published":"2025-08-01T23:03:42Z","proceeding":"cs.DS","tasks":"[\"cs.DS\",\"cs.CG\",\"cs.DB\"]","methods":"[\"LoRA\"]","has_code":false}
