{"ID":2839378,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.15120","arxiv_id":"2511.15120","title":"Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit","abstract":"In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(\\boldsymbol{x})=g(\\boldsymbol{U}\\boldsymbol{x})$ with hidden subspace $\\boldsymbol{U}\\in \\mathbb{R}^{r\\times d}$, which is the canonical setup to study representation learning. We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error using $\\widetilde{\\mathcal{O}}(d)$ samples and $\\widetilde{\\mathcal{O}}(d^2)$ time. The sample and time complexity both align with the information-theoretic limit up to leading order and are therefore optimal. During the first stage of gradient descent learning, the proof proceeds via showing that the inner weights can perform a power-iteration process. This process implicitly mimics a spectral start for the whole span of the hidden subspace and eventually eliminates finite-sample noise and recovers this span. It surprisingly indicates that optimal results can only be achieved if the first layer is trained for more than $\\mathcal{O}(1)$ steps. This work demonstrates the ability of neural networks to effectively learn hierarchical functions with respect to both sample and time efficiency.","short_abstract":"In deep learning, a central issue is to understand how neural networks efficiently learn high-dimensional features. To this end, we explore the gradient descent learning of a general Gaussian Multi-index model $f(\\boldsymbol{x})=g(\\boldsymbol{U}\\boldsymbol{x})$ with hidden subspace $\\boldsymbol{U}\\in \\mathbb{R}^{r\\time...","url_abs":"https://arxiv.org/abs/2511.15120","url_pdf":"https://arxiv.org/pdf/2511.15120v2","authors":"[\"Bohan Zhang\",\"Zihao Wang\",\"Hengyu Fu\",\"Jason D. Lee\"]","published":"2025-11-19T04:46:47Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.AI\",\"cs.IT\",\"cs.LG\",\"math.ST\"]","methods":"[]","has_code":false}
