{"ID":2861572,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.02174","arxiv_id":"2510.02174","title":"Flatness-Aware Stochastic Gradient Langevin Dynamics","abstract":"Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dynamics toward flat basins while retaining the computational and memory efficiency of SGD and SGLD. We provide a non-asymptotic theoretical analysis showing that fSGLD targets a flatness-biased Gibbs distribution under a theoretically prescribed coupling between the noise scale $σ$ and the inverse temperature $β$, together with explicit excess risk guarantees. We empirically evaluate fSGLD across standard optimizer benchmarks, Bayesian image classification, uncertainty quantification, and out-of-distribution detection, demonstrating consistently strong performance and reliable uncertainty estimates. Additional experiments confirm the effectiveness of the theoretically prescribed $β$-$σ$ coupling compared to decoupled choices.","short_abstract":"Flatness of the loss landscape has been widely studied as an important perspective for understanding the behavior and generalization of deep learning algorithms. Motivated by this view, we propose Flatness-Aware Stochastic Gradient Langevin Dynamics (fSGLD), a first-order optimization method that biases learning its dy...","url_abs":"https://arxiv.org/abs/2510.02174","url_pdf":"https://arxiv.org/pdf/2510.02174v3","authors":"[\"Stefano Bruno\",\"Youngsik Hwang\",\"Jaehyeon An\",\"Sotirios Sabanis\",\"Dong-Young Lim\"]","published":"2025-10-02T16:24:46Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.OC\",\"math.PR\",\"stat.ML\"]","methods":"[]","has_code":false}