{"ID":2884019,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.07142","arxiv_id":"2508.07142","title":"Why Does Stochastic Gradient Descent Slow Down in Low-Precision Training?","abstract":"Low-precision training has become crucial for reducing the computational and memory costs of large-scale deep learning. However, quantizing gradients introduces magnitude shrinkage, which can change how stochastic gradient descent (SGD) converges. In this study, we explore SGD convergence under a gradient shrinkage model, where each stochastic gradient is scaled by a factor \\( q_k \\in (0,1] \\). We show that this shrinkage affect the usual stepsize \\( μ_k \\) with an effective stepsize \\( μ_k q_k \\), slowing convergence when \\( q_{\\min} \u003c 1 \\). With typical smoothness and bounded-variance assumptions, we prove that low-precision SGD still converges, but at a slower pace set by \\( q_{\\min} \\), and with a higher steady error level due to quantization effects. We analyze theoretically how lower numerical precision slows training by treating it as gradient shrinkage within the standard SGD convergence setup.","short_abstract":"Low-precision training has become crucial for reducing the computational and memory costs of large-scale deep learning. However, quantizing gradients introduces magnitude shrinkage, which can change how stochastic gradient descent (SGD) converges. In this study, we explore SGD convergence under a gradient shrinkage mod...","url_abs":"https://arxiv.org/abs/2508.07142","url_pdf":"https://arxiv.org/pdf/2508.07142v4","authors":"[\"Vincent-Daniel Yun\"]","published":"2025-08-10T02:25:48Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\",\"cs.IT\",\"math.NA\"]","methods":"[]","has_code":false}
