A Unified Approach To Reinforcement Learning, Quantal Response Equilibria, And Two-player Zero-sum Games
2022 Β· Samuel Sokota, Ryan D'Orazio, J. Zico Kolter, et al.
Abstract
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is demonstrating the virtues of magnetic mirror descent as both an equilibrium solver and as an approach to reinforcement learning in two-player zero-sum games. These virtues include: 1) Being the first quantal response equilibria solver to achieve linear convergence for extensive-form games with first order feedback; 2) Being the first standard reinforcement learning algorithm to achieve empirically competitive results with CFR in tabular settings; 3) Achieving favorable performance in 3x3 Dark Hex and Phantom Tic-Tac-Toe as a self-play deep reinforcement learning algorithm.
Authors
(none)
Tags
Stats
Related papers
- Asynchronous Gradient Play In Zero-sum Multi-agent Games (2022)0.00
- Decentralized Q-learning In Zero-sum Markov Games (2021)0.00
- Population-aware Online Mirror Descent For Mean-field Games By Deep Reinforcement Learning (2024)0.00
- Last-iterate Convergence Of Decentralized Optimistic Gradient Descent/ascent In Infinite-horizon Competitive Markov Games (2021)0.00
- Exploration-exploitation In Multi-agent Competition: Convergence With Bounded Rationality (2021)0.00
- A Unified Perspective On Deep Equilibrium Finding (2022)0.00
- Reevaluating Policy Gradient Methods For Imperfect-information Games (2025)0.00
- On The Variational Interpretation Of Mirror Play In Monotone Games (2024)5.84