Abstract

Decentralized Actor-Critic (AC) algorithms have been widely utilized for multi-agent reinforcement learning (MARL) and have achieved remarkable success. Apart from its empirical success, the theoretical convergence property of decentralized AC algorithms is largely unexplored. Most of the existing finite-time convergence results are derived based on either double-loop update or two-timescale step sizes rule, and this is the case even for centralized AC algorithm under a single-agent setting. In practice, the *single-timescale* update is widely utilized, where actor and critic are updated in an alternating manner with step sizes being of the same order. In this work, we study a decentralized *single-timescale* AC algorithm.Theoretically, using linear approximation for value and reward estimation, we show that the algorithm has sample complexity of \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) under Markovian sampling, which matches the optimal complexity with a double-loop implementation

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyluo2022finite

Related papers