{"ID":2849882,"CreatedAt":"2026-06-01T04:54:23.091178241Z","UpdatedAt":"2026-06-01T04:54:23.091178241Z","DeletedAt":null,"paper_url":"https://arxiv.org/abs/2510.23914","arxiv_id":"2510.23914","title":"Revisiting Value Iteration: Unified Analysis of Discounted and Average-Reward Cases","abstract":"While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate $γ$, while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.","short_abstract":"While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate $γ$, while in the average-reward...","url_abs":"https://arxiv.org/abs/2510.23914","url_pdf":"https://arxiv.org/pdf/2510.23914v2","authors":"[\"Arsenii Mustafin\",\"Xinyi Sheng\",\"Dominik Baumann\"]","published":"2025-10-27T22:42:53Z","proceeding":"cs.LG","tasks":"[\"cs.LG\"]","methods":"[\"Reinforcement Learning\"]","has_code":false}
