{"ID":2826542,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2512.18713","arxiv_id":"2512.18713","title":"Tight Lower Bounds and Optimal Algorithms for Stochastic Nonconvex Optimization with Heavy-Tailed Noise","abstract":"We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded $p$-th central moment ($p$-BCM) for some $p \\in (1,2]$. Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower bounds for all first-order methods under \\emph{relaxed} mean-squared smoothness ($q$-WAS) and $δ$-similarity ($(q, δ)$-S) assumptions, allowing any exponent $q \\in [1,2]$ instead of the standard $q = 2$. These results substantially broaden the scope of existing lower bounds. To complement them, we show that Normalized Stochastic Gradient Descent with Momentum Variance Reduction (NSGD-MVR), a known algorithm, matches these bounds in expectation. Beyond expectation guarantees, we introduce a new algorithm, Double-Clipped NSGD-MVR, which allows the derivation of high-probability convergence rates under weaker assumptions than in previous works. Finally, for second-order methods with stochastic Hessians satisfying bounded $q$-th central moment assumptions for some exponent $q \\in [1, 2]$ (allowing $q \\neq p$), we establish sharper lower bounds than previous works while improving over Sadiev et al. (2025) (where only $p = q$ is considered) and yielding stronger convergence exponents. Together, these results provide a nearly complete complexity characterization of stochastic nonconvex optimization in heavy-tailed regimes.","short_abstract":"We study stochastic nonconvex optimization under heavy-tailed noise. In this setting, the stochastic gradients only have bounded $p$-th central moment ($p$-BCM) for some $p \\in (1,2]$. Building on the foundational work of Arjevani et al. (2022) in stochastic optimization, we establish tight sample complexity lower boun...","url_abs":"https://arxiv.org/abs/2512.18713","url_pdf":"https://arxiv.org/pdf/2512.18713v2","authors":"[\"Adrien Fradin\",\"Abdurakhmon Sadiev\",\"Laurent Condat\",\"Peter Richtárik\"]","published":"2025-12-21T12:19:23Z","proceeding":"math.OC","tasks":"[\"math.OC\"]","methods":"[]","has_code":false}
