{"ID":2884890,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2508.06661","arxiv_id":"2508.06661","title":"Convergence of Fast Policy Iteration in Markov Games and Robust MDPs","abstract":"Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may fail to converge to a saddle point and may loop indefinitely, even in small games. This observation contradicts the proof of FT's convergence to a saddle point in the original paper. As our second contribution, we propose Residual Conditioned Policy Iteration (RCPI). RCPI builds on FT, but is guaranteed to converge to a saddle point. Our numerical results show that RCPI outperforms other convergent algorithms by several orders of magnitude.","short_abstract":"Markov games and robust MDPs are closely related models that involve computing a pair of saddle point policies. As part of the long-standing effort to develop efficient algorithms for these models, the Filar-Tolwinski (FT) algorithm has shown considerable promise. As our first contribution, we demonstrate that FT may f...","url_abs":"https://arxiv.org/abs/2508.06661","url_pdf":"https://arxiv.org/pdf/2508.06661v2","authors":"[\"Keith Badger\",\"Jefferson Huang\",\"Marek Petrik\"]","published":"2025-08-08T19:25:28Z","proceeding":"cs.GT","tasks":"[\"cs.GT\"]","methods":"[]","has_code":false}