{"ID":2845849,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.03554","arxiv_id":"2511.03554","title":"The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits","abstract":"Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold CV. Our results consist of a novel decomposition of the mean-squared error of cross-validation for risk estimation, which explicitly captures the correlations of error estimates across overlapping folds and includes a novel algorithmic stability notion, squared loss stability, that is considerably weaker than the typically required hypothesis stability in other comparable works. Furthermore, we prove: 1. For any learning algorithm that minimizes empirical risk, the mean-squared error of the $k$-fold cross-validation estimator $\\widehat{L}_{\\mathrm{CV}}^{(k)}$ of the population risk $L_{D}$ satisfies the following minimax lower bound: \\[ \\min_{k \\mid n} \\max_{D} \\mathbb{E}\\left[\\big(\\widehat{L}_{\\mathrm{CV}}^{(k)} - L_{D}\\big)^{2}\\right]=Ω\\big(\\sqrt{k^*}/n\\big), \\] where $n$ is the sample size, $k$ the number of folds, and $k^*$ denotes the number of folds attaining the minimax optimum. This shows that even under idealized conditions, for large values of $k$, CV cannot attain the optimum of order $1/n$ achievable by a validation set of size $n$, reflecting an inherent penalty caused by dependence between folds. 2. Complementing this, we exhibit learning rules for which \\[ \\max_{D}\\mathbb{E}\\!\\left[\\big(\\widehat{L}_{\\mathrm{CV}}^{(k)} - L_{D}\\big)^{2}\\right]=Ω(k/n), \\] matching (up to constants) the accuracy of a hold-out estimator of a single fold of size $n/k$. Together these results delineate the fundamental trade-off in resampling-based risk estimation: CV cannot fully exploit all $n$ samples for unbiased risk evaluation, and its minimax performance is pinned between the $k/n$ and $\\sqrt{k}/n$ regimes.","short_abstract":"Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds in $k$-fold CV. Our results consist of a novel decomposition of the mean-square...","url_abs":"https://arxiv.org/abs/2511.03554","url_pdf":"https://arxiv.org/pdf/2511.03554v2","authors":"[\"Ido Nachum\",\"Rüdiger Urbanke\",\"Thomas Weinberger\"]","published":"2025-11-05T15:35:46Z","proceeding":"math.ST","tasks":"[\"math.ST\",\"cs.LG\"]","methods":"[]","has_code":false}