{"ID":2843095,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2511.07767","arxiv_id":"2511.07767","title":"Taking the Road Less Scheduled with Adaptive Polyak Steps","abstract":"Schedule-Free SGD, proposed in [Defazio et al., 2024], achieves optimal convergence rates without requiring the training horizon in advance, by replacing learning rate schedules with a principled form of iterate averaging. However, the method still requires tuning a base learning rate whose optimal value depends on unknown problem constants. In this work, we continue down this road by deriving Polyak-type step sizes for Schedule-Free SGD and Adam that compute the learning rate at each iteration from the sampled loss, gradient, and current iterates alone. We first propose an oracle variant that uses per-sample optimal function values and prove an $O(1/\\sqrt{t})$ anytime last-iterate rate for convex Lipschitz objectives. We then remove the oracle requirement with a safeguarded variant that replaces the unknown optimal values with any available lower bound, achieving the same rate up to a neighborhood that vanishes under interpolation. Both step sizes reduce to existing Polyak rules for standard SGD when momentum is set to zero, unifying standard and schedule-free Polyak methods. Numerical experiments on language modeling, including pretraining and distillation, show that the proposed methods match or surpass tuned Schedule-Free baselines while offering greater robustness to hyperparameter choices.","short_abstract":"Schedule-Free SGD, proposed in [Defazio et al., 2024], achieves optimal convergence rates without requiring the training horizon in advance, by replacing learning rate schedules with a principled form of iterate averaging. However, the method still requires tuning a base learning rate whose optimal value depends on unk...","url_abs":"https://arxiv.org/abs/2511.07767","url_pdf":"https://arxiv.org/pdf/2511.07767v2","authors":"[\"Dimitris Oikonomou\",\"Matthew Buchholz\",\"Yuen-Man Pun\",\"Robert M. Gower\",\"Nicolas Loizou\"]","published":"2025-11-11T02:33:24Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Language Model\"]","has_code":false}
