Multi-agent Reinforcement Learning With Reward Delays
2022 Β· Yuyang Zhang, Runyu Zhang, Yuantao Gu, et al.
Abstract
This paper considers multi-agent reinforcement learning (MARL) where the rewards are received after delays and the delay time varies across agents and across time steps. Based on the V-learning framework, this paper proposes MARL algorithms that efficiently deal with reward delays. When the delays are finite, our algorithm reaches a coarse correlated equilibrium (CCE) with rate \(\tilde\{\mathcal\{O\}\}(\frac\{H^3\sqrt\{S\mathcal\{T\}_K\}\}\{K\}+\frac\{H^3\sqrt\{SA\}\}\{\sqrt\{K\}\})\) where \(K\) is the number of episodes, \(H\) is the planning horizon, \(S\) is the size of the state space, \(A\) is the size of the largest action space, and \(\mathcal\{T\}_K\) is the measure of total delay formally defined in the paper. Moreover, our algorithm is extended to cases with infinite delays through a reward skipping scheme. It achieves convergence rate similar to the finite delay case.
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Delay-aware Multi-agent Reinforcement Learning For Cooperative And Competitive Environments (2020)0.00
- V-learning -- A Simple, Efficient, Decentralized Algorithm For Multiagent RL (2021)0.00
- Multi-agent Reinforcement Learning Via Adaptive Kalman Temporal Difference And Successor Representation (2021)0.00
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Scalable Multi-agent Reinforcement Learning For Networked Systems With Average Reward (2020)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00