Taming Multi-agent Reinforcement Learning With Estimator Variance Reduction
2022 Β· Taher Jafferjee, Juliusz Ziomek, Tianpei Yang, et al.
Abstract
Centralised training with decentralised execution (CT-DE) serves as the foundation of many leading multi-agent reinforcement learning (MARL) algorithms. Despite its popularity, it suffers from a critical drawback due to its reliance on learning from a single sample of the joint-action at a given state. As agents explore and update their policies during training, these single samples may poorly represent the actual joint-policy of the system of agents leading to high variance gradient estimates that hinder learning. To address this problem, we propose an enhancement tool that accommodates any actor-critic MARL method. Our framework, Performance Enhancing Reinforcement Learning Apparatus (PERLA), introduces a sampling technique of the agents' joint-policy into the critics while the agents train. This leads to TD updates that closely approximate the true expected value under the current joint-policy rather than estimates from a single sample of the joint-action at a given state. This prod
Authors
(none)
Tags
Stats
Related papers
- Is Centralized Training With Decentralized Execution Framework Centralized Enough For MARL? (2023)0.00
- CTDS: Centralized Teacher With Decentralized Student For Multi-agent Reinforcement Learning (2022)0.00
- Sample And Communication Efficient Fully Decentralized MARL Policy Evaluation Via A New Approach: Local TD Update (2024)0.00
- GTDE: Grouped Training With Decentralized Execution For Multi-agent Actor-critic (2024)3.58
- Enhancing Sample Efficiency In Multi-agent RL With Uncertainty Quantification And Selective Exploration (2025)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00
- Tacit Learning With Adaptive Information Selection For Cooperative Multi-agent Reinforcement Learning (2024)0.00
- Optimistic {\epsilon}-greedy Exploration For Cooperative Multi-agent Reinforcement Learning (2025)0.00