{"ID":2899511,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.00489","arxiv_id":"2507.00489","title":"Towards Efficient Random-Order Enumeration for Join Queries","abstract":"In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumerated in uniformly random order. However, existing studies lack an efficient random-order enumeration algorithm with a worst-case runtime guarantee for (cyclic) join queries. In this paper, we study the problem of enumerating the results of a join query in random order. We develop an efficient random-order enumeration algorithm for join queries with no large hidden constants in its complexity, achieving expected $O(\\frac{\\mathrm{AGM}(Q)}{|Res(Q)|}\\log^2|Q|)$ delay, $O(\\mathrm{AGM}(Q)\\log|Q|)$ total running time after $O(|Q|\\log|Q|)$-time index construction, where $|Q|$ is the size of input, $\\mathrm{AGM}(Q)$ is the AGM bound, and $|Res(Q)|$ is the size of the join result. We prove that our algorithm is near-optimal in the worst case, under the combinatorial $k$-clique hypothesis. Our algorithm requires no query-specific preprocessing and can be flexibly adapted to many common database indexes with only minor modifications. We also devise two non-trivial techniques to speed up the enumeration, and provide an experimental study on our enumeration algorithm along with the speed-up techniques. The experimental results show that our algorithm, enhanced with the proposed techniques, significantly outperforms existing state-of-the-art methods.","short_abstract":"In many data analysis pipelines, a basic and time-consuming process is to produce join results and feed them into downstream tasks. Numerous enumeration algorithms have been developed for this purpose. To be a statistically meaningful representation of the whole join result, the result tuples are required to be enumera...","url_abs":"https://arxiv.org/abs/2507.00489","url_pdf":"https://arxiv.org/pdf/2507.00489v1","authors":"[\"Pengyu Chen\",\"Zizheng Guo\",\"Jianwei Yang\",\"Dongjing Miao\"]","published":"2025-07-01T07:05:46Z","proceeding":"cs.DB","tasks":"[\"cs.DB\"]","methods":"[]","has_code":false}
