{"ID":2844254,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.06211","arxiv_id":"2511.06211","title":"Sparse Linear Regression is Easy on Random Supports","abstract":"Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X \\in \\mathbb{R}^{N \\times d}$ and measurements or labels ${y} \\in \\mathbb{R}^N$ where ${y} = {X} {w}^* + ξ$, and $ξ$ is the noise in the measurements. Importantly, we have the additional constraint that the unknown signal vector ${w}^*$ is sparse: it has $k$ non-zero entries where $k$ is much smaller than the ambient dimension. Our goal is to output a prediction vector $\\widehat{w}$ that has small prediction error: $\\frac{1}{N}\\cdot \\|{X} {w}^* - {X} \\widehat{w}\\|^2_2$. Information-theoretically, we know what is best possible in terms of measurements: under most natural noise distributions, we can get prediction error at most $ε$ with roughly $N = O(k \\log d/ε)$ samples. Computationally, this currently needs $d^{Ω(k)}$ run-time. Alternately, with $N = O(d)$, we can get polynomial-time. Thus, there is an exponential gap (in the dependence on $d$) between the two and we do not know if it is possible to get $d^{o(k)}$ run-time and $o(d)$ samples. We give the first generic positive result for worst-case design matrices ${X}$: For any ${X}$, we show that if the support of ${w}^*$ is chosen at random, we can get prediction error $ε$ with $N = \\text{poly}(k, \\log d, 1/ε)$ samples and run-time $\\text{poly}(d,N)$. This run-time holds for any design matrix ${X}$ with condition number up to $2^{\\text{poly}(d)}$. Previously, such results were known for worst-case ${w}^*$, but only for random design matrices from well-behaved families, matrices that have a very low condition number ($\\text{poly}(\\log d)$; e.g., as studied in compressed sensing), or those with special structural properties.","short_abstract":"Sparse linear regression is one of the most basic questions in machine learning and statistics. Here, we are given as input a design matrix $X \\in \\mathbb{R}^{N \\times d}$ and measurements or labels ${y} \\in \\mathbb{R}^N$ where ${y} = {X} {w}^* + ξ$, and $ξ$ is the noise in the measurements. Importantly, we have the ad...","url_abs":"https://arxiv.org/abs/2511.06211","url_pdf":"https://arxiv.org/pdf/2511.06211v1","authors":"[\"Gautam Chandrasekaran\",\"Raghu Meka\",\"Konstantinos Stavropoulos\"]","published":"2025-11-09T03:48:21Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.DS\",\"math.ST\",\"stat.ML\"]","methods":"[]","has_code":false}