A New Policy Iteration Algorithm For Reinforcement Learning In Zero-sum Markov Games
2023 Β· Anna Winnicki, R. Srikant
Abstract
Optimal policies in standard MDPs can be obtained using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead policies, which are anyway used in practical algorithms. We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- Smoothing Policy Iteration For Zero-sum Markov Games (2022)3.58
- Improving Sample Efficiency Of Model-free Algorithms For Zero-sum Markov Games (2023)0.00
- Learning Nash Equilibria In Zero-sum Stochastic Games Via Entropy-regularized Policy Approximation (2020)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Can We Find Nash Equilibria At A Linear Rate In Markov Games? (2023)0.00