A Finite Time Analysis Of Distributed Q-learning

Abstract

Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of \(\tilde\{\mathcal\{O\}\}\left( \min\left\\{\frac\{1\}\{\epsilon^2\}\frac\{t_\{\text\{mix\}\}\}\{(1-\gamma)^6 d_\{\min\}^4 \} ,\frac\{1\}\{\epsilon\}\frac\{\sqrt\{|\gS||\gA|\}\}\{(1-\sigma_2(\boldsymbol\{W\}))(1-\gamma)^4 d_\{\min\}^3\} \right\\}\right)\) under tabular lookup

A Finite Time Analysis Of Distributed Q-learning

Abstract

Authors

Tags

Stats

Related papers