{"ID":2856816,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.10665","arxiv_id":"2510.10665","title":"Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination","abstract":"We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\\ samples from a distribution $(x, y)$ on $\\mathbb{R}^d \\times \\mathbb{R}$ with $x \\sim \\mathcal{N}(0,\\mathbf{I}_d)$ and $y = x^\\top β+ z$, where $z$ is drawn independently of $x$ from an unknown distribution $E$. Moreover, $z$ satisfies $\\mathbb{P}_E[z = 0] = α\u003e0$. The goal is to accurately recover the regressor $β$ to small $\\ell_2$-error. Ignoring computational considerations, this problem is known to be solvable using $O(d/α)$ samples. On the other hand, the best known polynomial-time algorithms require $Ω(d/α^2)$ samples. Here we provide formal evidence that the quadratic dependence in $1/α$ is inherent for efficient algorithms. Specifically, we show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $\\tildeΩ(d^{1/2}/α^2)$.","short_abstract":"We study the task of noiseless linear regression under Gaussian covariates in the presence of additive oblivious contamination. Specifically, we are given i.i.d.\\ samples from a distribution $(x, y)$ on $\\mathbb{R}^d \\times \\mathbb{R}$ with $x \\sim \\mathcal{N}(0,\\mathbf{I}_d)$ and $y = x^\\top β+ z$, where $z$ is drawn...","url_abs":"https://arxiv.org/abs/2510.10665","url_pdf":"https://arxiv.org/pdf/2510.10665v1","authors":"[\"Ilias Diakonikolas\",\"Chao Gao\",\"Daniel M. Kane\",\"John Lafferty\",\"Ankit Pensia\"]","published":"2025-10-12T15:42:44Z","proceeding":"cs.DS","tasks":"[\"cs.DS\",\"math.ST\",\"stat.ML\"]","methods":"[]","has_code":false}
