{"ID":2866850,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.18508","arxiv_id":"2509.18508","title":"CaCuTe: Casual Cubic-Model Technique for Faster Optimization","abstract":"We establish a local $\\mathcal{O}(k^{-2})$ rate for the gradient update $x^{k+1}=x^k-\\nabla f(x^k)/\\sqrt{H\\|\\nabla f(x^k)\\|}$ under a $2H$-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian formation or factorization. Incorporating this certificate into cubic-regularized Newton (CRN) and an accelerated variant enables per-iterate switching between the cubic and gradient steps while preserving CRN's global guarantees. The technique achieves the lowest wall-clock time among compared baselines in our experiments. In the first-order setting, the technique yields a monotone, adaptive, parameter-free method that inherits the local $\\mathcal{O}(k^{-2})$ rate. Despite backtracking, the method shows superior wall-clock performance. Additionally, we cover smoothness relaxations beyond classical gradient--Lipschitzness, enabling tighter bounds, including global $\\mathcal{O}(k^{-2})$ rates. Finally, we generalize the technique to the stochastic setting.","short_abstract":"We establish a local $\\mathcal{O}(k^{-2})$ rate for the gradient update $x^{k+1}=x^k-\\nabla f(x^k)/\\sqrt{H\\|\\nabla f(x^k)\\|}$ under a $2H$-Hessian--Lipschitz assumption. Regime detection relies on Hessian--vector products, avoiding Hessian formation or factorization. Incorporating this certificate into cubic-regularize...","url_abs":"https://arxiv.org/abs/2509.18508","url_pdf":"https://arxiv.org/pdf/2509.18508v1","authors":"[\"Nazarii Tupitsa\"]","published":"2025-09-23T01:16:59Z","proceeding":"math.OC","tasks":"[\"math.OC\"]","methods":"[]","has_code":false}
