Multi-agent Reinforcement Learning With Reward Delays

Abstract

This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate \(\tilde\{\mathcal\{O\}\}(\frac\{H^3\sqrt\{S\mathcal\{T\}_K\}\}\{K\}+\frac\{H^3\sqrt\{SA\}\}\{\sqrt\{K\}\})\) where \(K\) is the number of episodes, \(H\) is the planning horizon, \(S\) is the size of the state space, \(A\) is the size of the largest action space, and \(\mathcal\{T\}_K\) is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.

Multi-agent Reinforcement Learning With Reward Delays

Abstract

Authors

Tags

Stats

Related papers