{"ID":2851616,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.19382","arxiv_id":"2510.19382","title":"A Derandomization Framework for Structure Discovery: Applications in Neural Networks and Beyond","abstract":"Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic gradient descent (SGD) and a strong regularizer. This structural property is known to reduce sample complexity of generalization. Indeed, in a second step, the same authors establish algorithm-specific learning guarantees under additional assumptions. In this paper, we focus exclusively on the structure discovery aspect and study it under weaker assumptions, more specifically: we allow (a) NNs of arbitrary size and depth, (b) with all parameters trainable, (c) under any smooth loss function, (d) tiny regularization, and (e) trained by any method that attains a second-order stationary point (SOSP), e.g.\\ perturbed gradient descent (PGD). At the core of our approach is a key $\\textit{derandomization}$ lemma, which states that optimizing the function $\\mathbb{E}_{\\mathbf{x}} \\left[g_θ(\\mathbf{W}\\mathbf{x} + \\mathbf{b})\\right]$ converges to a point where $\\mathbf{W} = \\mathbf{0}$, under mild conditions. The fundamental nature of this lemma directly explains structure discovery and has immediate applications in other domains including an end-to-end approximation for MAXCUT, and computing Johnson-Lindenstrauss embeddings.","short_abstract":"Understanding the dynamics of feature learning in neural networks (NNs) remains a significant challenge. The work of (Mousavi-Hosseini et al., 2023) analyzes a multiple index teacher-student setting and shows that a two-layer student attains a low-rank structure in its first-layer weights when trained with stochastic g...","url_abs":"https://arxiv.org/abs/2510.19382","url_pdf":"https://arxiv.org/pdf/2510.19382v2","authors":"[\"Nikos Tsikouras\",\"Yorgos Pantis\",\"Ioannis Mitliagkas\",\"Christos Tzamos\"]","published":"2025-10-22T08:55:00Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.LG\"]","methods":"[]","has_code":false}
