{"ID":2872370,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2509.09863","arxiv_id":"2509.09863","title":"Off Policy Lyapunov Stability in Reinforcement Learning","abstract":"Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.","short_abstract":"Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces...","url_abs":"https://arxiv.org/abs/2509.09863","url_pdf":"https://arxiv.org/pdf/2509.09863v2","authors":"[\"Sarvan Gill\",\"Daniela Constantinescu\"]","published":"2025-09-11T21:34:08Z","proceeding":"eess.SY","tasks":"[\"eess.SY\",\"cs.LG\",\"cs.RO\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}