{"ID":2859147,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.05573","arxiv_id":"2510.05573","title":"On the Theory of Continual Learning with Gradient Descent for Neural Networks","abstract":"Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting earlier ones, is a central goal of artificial intelligence. To better understand its underlying mechanisms, we study the limitations of continual learning in a tractable yet representative setting. Specifically, we analyze one-hidden-layer quadratic neural networks trained by gradient descent on a sequence of XOR-cluster datasets with Gaussian noise, where different tasks correspond to clusters with orthogonal means. Our analysis is based on a tight characterization of gradient descent dynamics for the training loss, which yields explicit bounds on the rate of train-time forgetting as functions of the number of iterations, sample size, number of tasks, and hidden-layer width. We then leverage an algorithmic stability framework to bound the generalization gap, leading to corresponding guarantees on test-time forgetting. Together, our results provide the first closed-form guarantees for forgetting in continual learning with neural networks and show how key problem parameters jointly govern forgetting dynamics. Numerical experiments corroborate our theoretical results.","short_abstract":"Continual learning, the ability of a model to adapt to an ongoing sequence of tasks without forgetting earlier ones, is a central goal of artificial intelligence. To better understand its underlying mechanisms, we study the limitations of continual learning in a tractable yet representative setting. Specifically, we an...","url_abs":"https://arxiv.org/abs/2510.05573","url_pdf":"https://arxiv.org/pdf/2510.05573v2","authors":"[\"Hossein Taheri\",\"Avishek Ghosh\",\"Arya Mazumdar\"]","published":"2025-10-07T04:32:27Z","proceeding":"stat.ML","tasks":"[\"stat.ML\",\"cs.IT\",\"cs.LG\"]","methods":"[]","has_code":false}
