A Black-box Approach For Non-stationary Multi-agent Reinforcement Learning

Abstract

We investigate learning the equilibria in non-stationary multi-agent systems and address the challenges that differentiate multi-agent learning from single-agent learning. Specifically, we focus on games with bandit feedback, where testing an equilibrium can result in substantial regret even when the gap to be tested is small, and the existence of multiple optimal solutions (equilibria) in stationary games poses extra challenges. To overcome these obstacles, we propose a versatile black-box approach applicable to a broad spectrum of problems, such as general-sum games, potential games, and Markov games, when equipped with appropriate learning and testing oracles for stationary environments. Our algorithms can achieve \(\widetilde\{O\}\left(\Delta^\{1/4\}T^\{3/4\}\right)\) regret when the degree of nonstationarity, as measured by total variation \(\Delta\), is known, and \(\widetilde\{O\}\left(\Delta^\{1/5\}T^\{4/5\}\right)\) regret when \(\Delta\) is unknown, where \(T\) is the number

A Black-box Approach For Non-stationary Multi-agent Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers