{"ID":2869017,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.16407","arxiv_id":"2509.16407","title":"WarpSpeed: A High-Performance Library for Concurrent GPU Hash Tables","abstract":"GPU hash tables are increasingly used to accelerate data processing, but their limited functionality restricts adoption in large-scale data processing applications. Current limitations include incomplete concurrency support and missing compound operations such as upserts. This paper presents WarpSpeed, a library of high-performance concurrent GPU hash tables with a unified benchmarking framework for performance analysis. WarpSpeed implements eight state-of-the-art Nvidia GPU hash table designs and provides a rich API designed for modern GPU applications. Our evaluation uses diverse benchmarks to assess both correctness and scalability, and we demonstrate real-world impact by integrating these hash tables into three downstream applications. We propose several optimization techniques to reduce concurrency overhead, including fingerprint-based metadata to minimize cache line probes and specialized Nvidia GPU instructions for lock-free queries. Our findings provide new insights into concurrent GPU hash table design and offer practical guidance for developing efficient, scalable data structures on modern GPUs.","short_abstract":"GPU hash tables are increasingly used to accelerate data processing, but their limited functionality restricts adoption in large-scale data processing applications. Current limitations include incomplete concurrency support and missing compound operations such as upserts. This paper presents WarpSpeed, a library of hig...","url_abs":"https://arxiv.org/abs/2509.16407","url_pdf":"https://arxiv.org/pdf/2509.16407v2","authors":"[\"Hunter McCoy\",\"Prashant Pandey\"]","published":"2025-09-19T20:31:38Z","proceeding":"cs.DC","tasks":"[\"cs.DC\",\"cs.DS\"]","methods":"[]","has_code":false}