Abstract

Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of \(\tilde\{\mathcal\{O\}\}\left( \min\left\\{\frac\{1\}\{\epsilon^2\}\frac\{t_\{\text\{mix\}\}\}\{(1-\gamma)^6 d_\{\min\}^4 \} ,\frac\{1\}\{\epsilon\}\frac\{\sqrt\{|\gS||\gA|\}\}\{(1-\sigma_2(\boldsymbol\{W\}))(1-\gamma)^4 d_\{\min\}^3\} \right\\}\right)\) under tabular lookup

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keylim2024a

Related papers