{"ID":2884451,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07067","arxiv_id":"2508.07067","title":"Unbiased Insights: Optimal Streaming Algorithms for $\\ell_p$ Sampling, the Forget Model, and Beyond","abstract":"We study $\\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \\in (0,2)$, we present a nearly space-optimal approximate $\\ell_p$ sampler that uses $\\widetilde{O}(\\log n \\log(1/δ))$ bits of space and for $p = 2$, we present a sampler with space complexity $\\widetilde{O}(\\log^2 n \\log(1/δ))$. This space complexity is optimal for $p \\in (0, 2)$ and improves upon prior work by a $\\log n$ factor. We further extend our construction to a continuous $\\ell_p$ sampler, which outputs a valid sample index at every point during the stream. Leveraging these samplers, we design nearly unbiased estimators for $F_p$ in data streams that include forget operations, which reset individual element frequencies and introduce significant non-linear challenges. As a result, we obtain near-optimal algorithms for estimating $F_p$ for all $p$ in this model, originally proposed by Pavan, Chakraborty, Vinodchandran, and Meel [PODS'24], resolving all three open problems they posed. Furthermore, we generalize this model to what we call the suffix-prefix deletion model, and extend our techniques to estimate entropy as a corollary of our moment estimation algorithms. Finally, we show how to handle arbitrary coordinate-wise functions during the stream, for any $g \\in \\mathbb{G}$, where $\\mathbb{G}$ includes all (linear or non-linear) contraction functions.","short_abstract":"We study $\\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \\in (0,2)$, we present a nearly space-optimal approximate $\\ell_p$ sampler that uses $\\widetilde{O}(\\log n \\log(1/δ))$ bits of space and for $p = 2$, we present a sampler with space complexity $\\widetilde{O}(\\...","url_abs":"https://arxiv.org/abs/2508.07067","url_pdf":"https://arxiv.org/pdf/2508.07067v3","authors":"[\"Honghao Lin\",\"Hoai-An Nguyen\",\"William Swartworth\",\"David P. Woodruff\"]","published":"2025-08-09T18:15:42Z","proceeding":"cs.DS","tasks":"[\"cs.DS\"]","methods":"[]","has_code":false}
