Revisiting Value Iteration: Unified Analysis Of Discounted And Average-reward Cases

Abstract

While Value Iteration (VI) is one of the most fundamental algorithms in Reinforcement Learning, its theoretical convergence guarantees still exhibit a persistent mismatch with empirical behavior. In the discounted-reward case, classical theory guarantees geometric convergence with rate \(\gamma\), while in the average-reward case recent work suggests that only sublinear convergence can be expected. In practice, however, VI is often observed to converge significantly faster. In this work, we show through a unified geometry-based analysis that, under an assumption of a unique and unichain optimal policy, (i) convergence is geometric in both the discounted- and average-reward settings and (ii) the convergence rate is faster than previous analyses suggest.

Revisiting Value Iteration: Unified Analysis Of Discounted And Average-reward Cases

Abstract

Authors

Tags

Stats

Related papers