Smoothing Policy Iteration For Zero-sum Markov Games
2022 Β· Yangang Ren, Yao Lyu, Wenxuan Wang, et al.
Abstract
Zero-sum Markov Games (MGs) has been an efficient framework for multi-agent systems and robust control, wherein a minimax problem is constructed to solve the equilibrium policies. At present, this formulation is well studied under tabular settings wherein the maximum operator is primarily and exactly solved to calculate the worst-case value function. However, it is non-trivial to extend such methods to handle complex tasks, as finding the maximum over large-scale action spaces is usually cumbersome. In this paper, we propose the smoothing policy iteration (SPI) algorithm to solve the zero-sum MGs approximately, where the maximum operator is replaced by the weighted LogSumExp (WLSE) function to obtain the nearly optimal equilibrium policies. Specially, the adversarial policy is served as the weight function to enable an efficient sampling over action spaces.We also prove the convergence of SPI and analyze its approximation error in \(\infty -\)norm based on the contraction mapping theor
Authors
(none)
Tags
Stats
Related papers
- A New Policy Iteration Algorithm For Reinforcement Learning In Zero-sum Markov Games (2023)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning From Offline Datasets (2022)0.00
- A Generalized Minimax Q-learning Algorithm For Two-player Zero-sum Stochastic Games (2019)9.03
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Policy Optimization For Continuous-time Linear-quadratic Graphon Mean Field Games (2025)0.00
- Refined Sample Complexity For Markov Games With Independent Linear Function Approximation (2024)0.00