{"ID":2854153,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.15714","arxiv_id":"2510.15714","title":"A Split-Client Approach to Second-Order Optimization","abstract":"Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory, factorization $\\mathcal{O}(d^3)$ typically dominates gradient evaluation $\\mathcal{O}(nd)$, creating a synchronization barrier that negates the per-iteration progress of classical second-order methods. We propose the \\emph{Split-Client} framework, which decouples optimization into parallel gradient and curvature processes. Unlike Lazy Hessian approaches, whose arithmetic-complexity analysis does not charge factorization time and whose optimal reuse frequency requires tuning, our method is fully \\textbf{delay-adaptive}: its wall-clock complexity scales with the \\emph{average} delay $\\Barτ$, and it matches the optimally-tuned Lazy rate of $\\mathcal{O}(\\eps^{-3/2}\\sqrt{\\Barτ})$ without any tuning. For persistent curvature error, we provide a noise-adaptive schedule with $\\widetilde{\\mathcal{O}}(T^{-3/4})$ rate (on $E[\\|\\nabla f\\|]^{3/2}$), recovering the rate that uniform-error analyses such as Kamzolov et al (2023) achieve via inflated regularization. Under a verifiable subspace-alignment condition, an additional \\emph{structured} analysis based on the secant condition of L-BFGS gives a faster $\\mathcal{O}(T^{-1})$ rate, with a hybrid theorem interpolating smoothly between the two regimes. We extend the framework to Subsampled Cubic Newton with adaptive batch sizes and an aggregate sampling budget linear in $T$. Experiments on two non-convex problems show wall-clock speedups of up to $800\\times$ over Vanilla and $30\\times$ over Lazy in the strongly factorization-dominated regime.","short_abstract":"Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory, factorization $\\mathcal{O}(d^3)$ typically dominates gradient evaluation $\\mathcal{O}(n...","url_abs":"https://arxiv.org/abs/2510.15714","url_pdf":"https://arxiv.org/pdf/2510.15714v3","authors":"[\"El Mahdi Chayti\",\"Martin Jaggi\"]","published":"2025-10-17T14:58:11Z","proceeding":"math.OC","tasks":"[\"math.OC\",\"cs.LG\"]","methods":"[]","has_code":false}
