{"ID":2860304,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.04189","arxiv_id":"2510.04189","title":"Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity","abstract":"Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only asymptotic convergence was established there. Subsequently, both asymptotic and non-asymptotic analyses of the critic-actor algorithm with linear function approximation were conducted. In our work, we introduce the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints. We provide the non-asymptotic convergence guarantees for this algorithm. Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity. We further show the results of experiments on three different Safety-Gym environments where our algorithm is found to be competitive in comparison with other well known algorithms.","short_abstract":"Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only as...","url_abs":"https://arxiv.org/abs/2510.04189","url_pdf":"https://arxiv.org/pdf/2510.04189v1","authors":"[\"Prashansa Panda\",\"Shalabh Bhatnagar\"]","published":"2025-10-05T13:02:38Z","proceeding":"cs.LG","tasks":"[\"cs.LG\",\"cs.AI\"]","methods":"[]","has_code":false}