{"ID":2894384,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.11260","arxiv_id":"2507.11260","title":"On Tight Robust Coresets for $k$-Medians Clustering","abstract":"This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + \\tilde{O}(kd\\varepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(m\\varepsilon^{-1}) + \\tilde{O}(\\min\\{k^{4/3}\\varepsilon^{-2},k\\varepsilon^{-3}\\})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + \\tilde{O}(kd\\varepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.","short_abstract":"This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + \\tilde{O}(kd\\varepsilon^{-2})$, which is optimal up to logarithmic fac...","url_abs":"https://arxiv.org/abs/2507.11260","url_pdf":"https://arxiv.org/pdf/2507.11260v1","authors":"[\"Lingxiao Huang\",\"Zhenyu Jiang\",\"Yi Li\",\"Xuan Wu\"]","published":"2025-07-15T12:33:19Z","proceeding":"cs.DS","tasks":"[\"cs.DS\",\"cs.CG\",\"cs.DM\"]","methods":"[]","has_code":false}
