{"ID":2893289,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.12709","arxiv_id":"2507.12709","title":"From SGD to Spectra: A Theory of Neural Network Weight Dynamics","abstract":"Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed 'bulk+tail' spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.","short_abstract":"Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra...","url_abs":"https://arxiv.org/abs/2507.12709","url_pdf":"https://arxiv.org/pdf/2507.12709v2","authors":"[\"Brian Richard Olsen\",\"Sam Fatehmanesh\",\"Frank Xiao\",\"Adarsh Kumarappan\",\"Anirudh Gajula\"]","published":"2025-07-17T01:06:39Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Transformer\"]","has_code":false}