MAC-PO: Multi-agent Experience Replay Via Collective Priority Optimization
2023 Β· Yongsheng Mei, Hanhan Zhou, Tian Lan, et al.
Abstract
Experience replay is crucial for off-policy reinforcement learning (RL) methods. By remembering and reusing the experiences from past different policies, experience replay significantly improves the training efficiency and stability of RL algorithms. Many decision-making problems in practice naturally involve multiple agents and require multi-agent reinforcement learning (MARL) under centralized training decentralized execution paradigm. Nevertheless, existing MARL algorithms often adopt standard experience replay where the transitions are uniformly sampled regardless of their importance. Finding prioritized sampling weights that are optimized for MARL experience replay has yet to be explored. To this end, we propose MAC-PO, which formulates optimal prioritized experience replay for multi-agent problems as a regret minimization over the sampling weights of transitions. Such optimization is relaxed and solved using the Lagrangian multiplier approach to obtain the close-form optimal samp
Authors
(none)
Tags
Stats
Related papers
- Accmer: Accelerating Multi-agent Experience Replay With Cache Locality-aware Prioritization (2023)5.24
- Stabilising Experience Replay For Deep Multi-agent Reinforcement Learning (2017)0.00
- Regret Minimization Experience Replay In Off-policy Reinforcement Learning (2021)0.00
- Higher Replay Ratio Empowers Sample-efficient Multi-agent Reinforcement Learning (2024)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- Prioritized Guidance For Efficient Multi-agent Reinforcement Learning Exploration (2019)0.00
- Multi-agent Constrained Policy Optimisation (2021)0.00
- Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay (2021)0.00