Stabilising Experience Replay For Deep Multi-agent Reinforcement Learning
2017 Β· Jakob Foerster, Nantas Nardelli, Gregory Farquhar, et al.
Abstract
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combinat
Authors
(none)
Tags
Stats
Related papers
- Stratified Experience Replay: Correcting Multiplicity Bias In Off-policy Reinforcement Learning (2021)0.00
- Higher Replay Ratio Empowers Sample-efficient Multi-agent Reinforcement Learning (2024)0.00
- Replay For Safety (2021)0.00
- Deep Reinforcement Learning For Multi-agent Systems: A Review Of Challenges, Solutions And Applications (2018)22.57
- Deep Multiagent Reinforcement Learning: Challenges And Directions (2021)0.00
- MAC-PO: Multi-agent Experience Replay Via Collective Priority Optimization (2023)0.00
- Stable Continual Reinforcement Learning Via Diffusion-based Trajectory Replay (2024)0.00
- Lenient Multi-agent Deep Reinforcement Learning (2017)4.52