Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration
2018 Β· Peixi Peng, Junliang Xing
Abstract
Many reality tasks such as robot coordination can be naturally modelled as multi-agent cooperative system where the rewards are sparse. This paper focuses on learning decentralized policies for such tasks using sub-optimal demonstration. To learn the multi-agent cooperation effectively and tackle the sub-optimality of demonstration, a self-improving learning method is proposed: On the one hand, the centralized state-action values are initialized by the demonstration and updated by the learned decentralized policy to improve the sub-optimality. On the other hand, the Nash Equilibrium are found by the current state-action value and are used as a guide to learn the policy. The proposed method is evaluated on the combat RTS games which requires a high level of multi-agent cooperation. Extensive experimental results on various combat scenarios demonstrate that the proposed method can learn multi-agent cooperation effectively. It significantly outperforms many state-of-the-art demonstration
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Multi-agent Interactions Modeling With Correlated Policies (2020)2.60
- Scalable Centralized Deep Multi-agent Reinforcement Learning Via Policy Gradients (2018)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Optimistic {\epsilon}-greedy Exploration For Cooperative Multi-agent Reinforcement Learning (2025)0.00
- Policy Gradient From Demonstration And Curiosity (2020)0.00
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00