Beyond The Bellman Fixed Point: Geometry And Fast Policy Identification In Value Iteration
2026 Β· Donghwan Lee
Abstract
arXiv:2604.17457v3 Announce Type: replace-cross Abstract: Q-value iteration (Q-VI) is usually analyzed through the \(\gamma\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname\{span\}(\mathbf 1)\), which is contained in the POSS. For every \(\epsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\bar\rho+\epsilon)^k\), where \(\bar\rho\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\bar\rho<\gamma\), this transverse convergence is
Authors
(none)
Tags
Stats
Related papers
- Revisiting Value Iteration: Unified Analysis Of Discounted And Average-reward Cases (2025)0.00
- Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning (2020)8.60
- Greedy-gq With Variance Reduction: Finite-time Analysis And Improved Complexity (2021)0.00
- Learning Near Optimal Policies With Low Inherent Bellman Error (2020)0.00
- The Uncertainty Bellman Equation And Exploration (2017)0.00
- Parameterized Projected Bellman Operator (2023)2.26
- Careful At Estimation And Bold At Exploration (2023)0.00
- Deflated Dynamics Value Iteration (2024)0.00