{"ID":2840976,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.12564","arxiv_id":"2511.12564","title":"Linear time small coresets for k-mean clustering of segments with applications","abstract":"We study the $k$-means problem for a set $\\mathcal{S} \\subseteq \\mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \\subseteq \\mathbb{R}^d$ that minimize $D(\\mathcal{S},X) := \\sum_{S \\in \\mathcal{S}} \\min_{x \\in X} D(S,x)$, where $D(S,x) := \\int_{p \\in S} |p - x| dp$ measures the total distance from each point along a segment to a center. Variants of this problem include handling outliers, employing alternative distance functions such as M-estimators, weighting distances to achieve balanced clustering, or enforcing unique cluster assignments. For any $\\varepsilon \u003e 0$, an $\\varepsilon$-coreset is a weighted subset $C \\subseteq \\mathbb{R}^d$ that approximates $D(\\mathcal{S},X)$ within a factor of $1 \\pm \\varepsilon$ for any set of $k$ centers, enabling efficient streaming, distributed, or parallel computation. We propose the first coreset construction that provably handles arbitrary input segments. For constant $k$ and $\\varepsilon$, it produces a coreset of size $O(\\log^2 n)$ computable in $O(nd)$ time. Experiments, including a real-time video tracking application, demonstrate substantial speedups with minimal loss in clustering accuracy, confirming both the practical efficiency and theoretical guarantees of our method.","short_abstract":"We study the $k$-means problem for a set $\\mathcal{S} \\subseteq \\mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \\subseteq \\mathbb{R}^d$ that minimize $D(\\mathcal{S},X) := \\sum_{S \\in \\mathcal{S}} \\min_{x \\in X} D(S,x)$, where $D(S,x) := \\int_{p \\in S} |p - x| dp$ measures the total distance from each point...","url_abs":"https://arxiv.org/abs/2511.12564","url_pdf":"https://arxiv.org/pdf/2511.12564v2","authors":"[\"David Denisov\",\"Shlomi Dolev\",\"Dan Felmdan\",\"Michael Segal\"]","published":"2025-11-16T11:48:55Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CG\",\"cs.CV\"]","methods":"[]","has_code":false}
