{"ID":2892836,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.14631","arxiv_id":"2507.14631","title":"$k$-PCA for (non-squared) Euclidean Distances: Polynomial Time Approximation","abstract":"Given an integer $k\\geq1$ and a set $P$ of $n$ points in $\\REAL^d$, the classic $k$-PCA (Principle Component Analysis) approximates the affine \\emph{$k$-subspace mean} of $P$, which is the $k$-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances ($\\ell_{2,2}$-norm) over the points of $P$, i.e., the mean of these distances. The \\emph{$k$-subspace median} is the subspace that minimizes its sum of (non-squared) Euclidean distances ($\\ell_{2,1}$-mixed norm), i.e., their median. The median subspace is usually more sparse and robust to noise/outliers than the mean, but also much harder to approximate since, unlike the $\\ell_{z,z}$ (non-mixed) norms, it is non-convex for $k\u003cd-1$. We provide the first polynomial-time deterministic algorithm whose both running time and approximation factor are not exponential in $k$. More precisely, the multiplicative approximation factor is $\\sqrt{d}$, and the running time is polynomial in the size of the input. We expect that our technique would be useful for many other related problems, such as $\\ell_{2,z}$ norm of distances for $z\\not \\in \\br{1,2}$, e.g., $z=\\infty$, and handling outliers/sparsity. Open code and experimental results on real-world datasets are also provided.","short_abstract":"Given an integer $k\\geq1$ and a set $P$ of $n$ points in $\\REAL^d$, the classic $k$-PCA (Principle Component Analysis) approximates the affine \\emph{$k$-subspace mean} of $P$, which is the $k$-dimensional affine linear subspace that minimizes its sum of squared Euclidean distances ($\\ell_{2,2}$-norm) over the points of...","url_abs":"https://arxiv.org/abs/2507.14631","url_pdf":"https://arxiv.org/pdf/2507.14631v1","authors":"[\"Daniel Greenhut\",\"Dan Feldman\"]","published":"2025-07-19T14:00:50Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.CG\",\"cs.DS\"]","methods":"[]","has_code":false}
