{"ID":2863888,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.25351","arxiv_id":"2509.25351","title":"Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region","abstract":"We examine gradient descent in matrix factorization and show that under large step sizes the parameter space develops a fractal structure. We derive the exact critical step size for convergence in scalar-vector factorization and show that near criticality the selected minimizer depends sensitively on the initialization. Moreover, we show that adding regularization amplifies this sensitivity, generating a fractal boundary between initializations that converge and those that diverge. The analysis extends to general matrix factorization with orthogonal initialization. Our findings reveal that near-critical step sizes induce a chaotic regime of gradient descent where the training outcome is unpredictable and there are no simple implicit biases, such as towards balancedness, minimum norm, or flatness.","short_abstract":"We examine gradient descent in matrix factorization and show that under large step sizes the parameter space develops a fractal structure. We derive the exact critical step size for convergence in scalar-vector factorization and show that near criticality the selected minimizer depends sensitively on the initialization...","url_abs":"https://arxiv.org/abs/2509.25351","url_pdf":"https://arxiv.org/pdf/2509.25351v3","authors":"[\"Shuang Liang\",\"Guido Montúfar\"]","published":"2025-09-29T18:04:22Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"stat.ML\"]","methods":"[]","has_code":false}
