{"ID":2851376,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20978","arxiv_id":"2510.20978","title":"A Geometric Analysis of PCA","abstract":"What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than $π/4$.","short_abstract":"What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under t...","url_abs":"https://arxiv.org/abs/2510.20978","url_pdf":"https://arxiv.org/pdf/2510.20978v1","authors":"[\"Ayoub El Hanchi\",\"Murat Erdogdu\",\"Chris Maddison\"]","published":"2025-10-23T20:15:26Z","proceeding":"math.ST","tasks":"[\"math.ST\",\"stat.ML\"]","methods":"[]","has_code":false}
