Revisiting Value Iteration: Unified Analysis Of Discounted And Average-reward Cases
2025 Β· Arsenii Mustafin, Xinyi Sheng, Dominik Baumann
Abstract
While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate \(\gamma\), while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.
Authors
(none)
Tags
Stats
Related papers
- Deflated Dynamics Value Iteration (2024)0.00
- Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning (2020)8.60
- Unifying Value Iteration, Advantage Learning, And Dynamic Policy Programming (2017)0.00
- Beyond The Bellman Fixed Point: Geometry And Fast Policy Identification In Value Iteration (2026)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- On Convergence Of Average-reward Off-policy Control Algorithms In Weakly Communicating Mdps (2022)0.00
- Convex Programs And Lyapunov Functions For Reinforcement Learning: A Unified Perspective On The Analysis Of Value-based Methods (2022)2.26