On Information Asymmetry In Competitive Multi-agent Reinforcement Learning: Convergence And Optimality
2020 Β· Ezra Tampubolon, Haris Ceribasic, Holger Boche
Abstract
In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population learning, which generally does not occur in an environment of general independent learners. The resulting post-learning policies are almost optimal in the underlying game sense, i.e., they form a Nash equilibrium. Furthermore, we propose in this work a Q-learning algorithm, requiring predictive observation of two subsequent opponent's actions, yielding an optimal strategy given that the latter applies a stationary strategy, and discuss the existence of the Nash equilibrium in the underlying information asymmetrical game.
Authors
(none)
Tags
Stats
Related papers
- Asymptotic Convergence And Performance Of Multi-agent Q-learning Dynamics (2023)0.00
- Beyond Strict Competition: Approximate Convergence Of Multi Agent Q-learning Dynamics (2023)0.00
- Exploration-exploitation In Multi-agent Competition: Convergence With Bounded Rationality (2021)0.00
- On The Stability Of Learning In Network Games With Many Players (2024)0.00
- Stability Of Multi-agent Learning In Competitive Networks: Delaying The Onset Of Chaos (2023)0.00
- Convergence And Connectivity: Dynamics Of Multi-agent Q-learning In Random Networks (2025)0.00
- Partial-information Q-learning For General Two-player Stochastic Games (2023)0.00
- Convergence Analysis Of Gradient-based Learning With Non-uniform Learning Rates In Non-cooperative Multi-agent Settings (2019)0.00