{"ID":2861137,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.03478","arxiv_id":"2510.03478","title":"How to Set $β_1, β_2$ in Adam: An Online Learning Perspective","abstract":"While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $β_1$ and $β_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL), one of the most important class of algorithms in online learning. The prior analyses in these works required setting $β_1 = \\sqrt{β_2}$, which does not cover the more practical cases with $β_1 \\neq \\sqrt{β_2}$. We derive novel, more general analyses that hold for both $β_1 \\geq \\sqrt{β_2}$ and $β_1 \\leq \\sqrt{β_2}$. In both cases, our results strictly generalize the existing bounds. Furthermore, we show that our bounds are tight in the worst case. We also prove that setting $β_1 = \\sqrt{β_2}$ is optimal for an oblivious adversary, but sub-optimal for an non-oblivious adversary.","short_abstract":"While Adam is one of the most effective optimizer for training large-scale machine learning models, a theoretical understanding of how to optimally set its momentum factors, $β_1$ and $β_2$, remains largely incomplete. Prior works have shown that Adam can be seen as an instance of Follow-the-Regularized-Leader (FTRL),...","url_abs":"https://arxiv.org/abs/2510.03478","url_pdf":"https://arxiv.org/pdf/2510.03478v2","authors":"[\"Quan Nguyen\"]","published":"2025-10-03T19:54:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"math.OC\"]","methods":"[]","has_code":false}