Learning Nash Equilibria In Zero-sum Stochastic Games Via Entropy-regularized Policy Approximation
2020 Β· Yue Guan, Qifan Zhang, Panagiotis Tsiotras
Abstract
We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm's ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium, while exhibiting a major speed-up over existing algorithms.
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization Finds Nash Equilibrium In Regularized General-sum LQ Games (2024)0.00
- Two-timescale Q-learning With Function Approximation In Zero-sum Stochastic Games (2023)0.00
- Fast Policy Extragradient Methods For Competitive Games With Entropy Regularization (2021)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Linear Convergence Of Independent Natural Policy Gradient In Games With Entropy Regularization (2024)3.58
- On The Convergence Of Policy Gradient Methods To Nash Equilibria In General Stochastic Games (2022)0.00
- Beyond Exact Gradients: Convergence Of Stochastic Soft-max Policy Gradient Methods With Entropy Regularization (2021)2.26
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00