{"ID":2894785,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2507.10120","arxiv_id":"2507.10120","title":"A Variance-Reduced Cubic-Regularized Newton for Policy Optimization","abstract":"In this paper, we study a second-order approach to policy optimization in reinforcement learning. Existing second-order methods often suffer from suboptimal sample complexity or rely on unrealistic assumptions about importance sampling. To overcome these limitations, we propose VR-CR-PN, a variance-reduced cubic-regularized policy Newton algorithm. To the best of our knowledge, this is the first algorithm that integrates Hessian-aided variance reduction with second-order policy optimization, effectively addressing the distribution shift problem and achieving best-known sample complexity under general nonconvex conditions but without the need for importance sampling. We theoretically establish that VR-CR-PN achieves a sample complexity of $\\tilde{\\mathcal{O}}(ε^{-3})$ to reach an $ε$-second-order stationary point, significantly improving upon the previous best result of $\\tilde{\\mathcal{O}}(ε^{-3.5})$ under comparable assumptions. As an additional contribution, we introduce a novel Hessian estimator for the expected return function, which admits a uniform upper bound independent of the horizon length $H$, allowing the algorithm to achieve horizon-independent sample complexity.","short_abstract":"In this paper, we study a second-order approach to policy optimization in reinforcement learning. Existing second-order methods often suffer from suboptimal sample complexity or rely on unrealistic assumptions about importance sampling. To overcome these limitations, we propose VR-CR-PN, a variance-reduced cubic-regula...","url_abs":"https://arxiv.org/abs/2507.10120","url_pdf":"https://arxiv.org/pdf/2507.10120v1","authors":"[\"Cheng Sun\",\"Zhen Zhang\",\"Shaofu Yang\"]","published":"2025-07-14T10:04:02Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
