{"ID":2851219,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.20608","arxiv_id":"2510.20608","title":"Convergence Analysis of SGD under Expected Smoothness","abstract":"Stochastic gradient descent (SGD) is the workhorse of large-scale learning, yet classical analyses rely on assumptions that can be either too strong (bounded variance) or too coarse (uniform noise). The expected smoothness (ES) condition has emerged as a flexible alternative that ties the second moment of stochastic gradients to the objective value and the full gradient. This paper presents a self-contained convergence analysis of SGD under ES. We (i) refine ES with interpretations and sampling-dependent constants; (ii) derive bounds of the expectation of squared full gradient norm; and (iii) prove $O(1/K)$ rates with explicit residual errors for various step-size schedules. All proofs are given in full detail in the appendix. Our treatment unifies and extends recent threads (Khaled and Richtárik, 2020; Umeda and Iiduka, 2025).","short_abstract":"Stochastic gradient descent (SGD) is the workhorse of large-scale learning, yet classical analyses rely on assumptions that can be either too strong (bounded variance) or too coarse (uniform noise). The expected smoothness (ES) condition has emerged as a flexible alternative that ties the second moment of stochastic gr...","url_abs":"https://arxiv.org/abs/2510.20608","url_pdf":"https://arxiv.org/pdf/2510.20608v2","authors":"[\"Yuta Kawamoto\",\"Hideaki Iiduka\"]","published":"2025-10-23T14:39:57Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[]","has_code":false}
