History-Aware Adaptive High-Order Tensor Regularization
Abstract
In this paper, we develop a new adaptive regularization method for minimizing a composite function, which is the sum of a $p$th-order ($p \ge 1$) Lipschitz continuous function and a simple, convex, and possibly nonsmooth function. We use a history of local Lipschitz estimates to adaptively select the current regularization parameter, an approach we shall term the {\it history-aware adaptive regularization method}. We explore how the selection of an appropriate volume of historical information affects both the theoretical and practical performance. By using all the historical information, our method matches the complexity guarantees of the standard $p$th-order tensor methods that require a known Lipschitz constant, for both convex and nonconvex objectives. In the nonconvex case, the number of iterations required to find an $(ε_g,ε_H)$-approximate second-order stationary point is bounded by $\mathcal{O}(\max\{ε_g^{-(p+1)/p}, ε_H^{-(p+1)/(p-1)}\})$. For convex functions, we establish an $\mathcal{O}(ε^{-1/p})$ iteration complexity for finding an $ε$-approximate optimal point and further propose an accelerated variant attaining an iteration complexity of $\mathcal{O}(ε^{-1/(p+1)})$. For practical consideration, we propose several variants of this method with only part of historical information. We introduce cyclic and sliding-window strategies for choosing historical Lipschitz estimates, which mitigate the limitation of overly conservative updates. As long as a rough upper bound of the Lipschitz constant is known, these two variants achieve the same iteration complexity guarantees in terms of the input accuracy as the method using full historical information. Finally, extensive numerical experiments are conducted to demonstrate the effectiveness of our adaptive approach.