Exploration-exploitation In Multi-agent Competition: Convergence With Bounded Rationality
2021 Β· Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove
Abstract
The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in compet
Authors
(none)
Tags
Stats
Related papers
- Beyond Strict Competition: Approximate Convergence Of Multi Agent Q-learning Dynamics (2023)0.00
- Asymptotic Convergence And Performance Of Multi-agent Q-learning Dynamics (2023)0.00
- On The Stability Of Learning In Network Games With Many Players (2024)0.00
- Stability Of Multi-agent Learning In Competitive Networks: Delaying The Onset Of Chaos (2023)0.00
- Convergence And Connectivity: Dynamics Of Multi-agent Q-learning In Random Networks (2025)0.00
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00
- On Information Asymmetry In Competitive Multi-agent Reinforcement Learning: Convergence And Optimality (2020)0.00
- The Bounds Of Algorithmic Collusion; \(q\)-learning, Gradient Learning, And The Folk Theorem (2024)0.00