Finite-time Analysis Of Fully Decentralized Single-timescale Actor-critic

Abstract

Decentralized Actor-Critic (AC) algorithms have been widely utilized for multi-agent reinforcement learning (MARL) and have achieved remarkable success. Apart from its empirical success, the theoretical convergence property of decentralized AC algorithms is largely unexplored. Most of the existing finite-time convergence results are derived based on either double-loop update or two-timescale step sizes rule, and this is the case even for centralized AC algorithm under a single-agent setting. In practice, the *single-timescale* update is widely utilized, where actor and critic are updated in an alternating manner with step sizes being of the same order. In this work, we study a decentralized *single-timescale* AC algorithm.Theoretically, using linear approximation for value and reward estimation, we show that the algorithm has sample complexity of \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) under Markovian sampling, which matches the optimal complexity with a double-loop implementation

Finite-time Analysis Of Fully Decentralized Single-timescale Actor-critic

Abstract

Authors

Tags

Stats

Related papers