Discovering Diverse Multi-agent Strategic Behavior Via Reward Randomization
2021 Β· Zhenggang Tang, Chao Yu, Boyuan Chen, et al.
Abstract
We propose a simple, general and effective technique, Reward Randomization for discovering diverse strategic policies in complex multi-agent games. Combining reward randomization and policy gradient, we derive a new algorithm, Reward-Randomized Policy Gradient (RPG). RPG is able to discover multiple distinctive human-interpretable strategies in challenging temporal trust dilemmas, including grid-world games and a real-world game Agar.io, where multiple equilibria exist but standard multi-agent policy gradient algorithms always converge to a fixed one with a sub-optimal payoff for every player even using state-of-the-art exploration techniques. Furthermore, with the set of diverse strategies from RPG, we can (1) achieve higher payoffs by fine-tuning the best policy from the set; and (2) obtain an adaptive agent by using this set of strategies as its training opponents. The source code and example videos can be found in our website: https://sites.google.com/view/staghuntrpg.
Authors
(none)
Tags
Stats
Related papers
- Continuously Discovering Novel Strategies Via Reward-switching Policy Optimization (2022)0.00
- Robust And Diverse Multi-agent Learning Via Rational Policy Gradient (2025)0.00
- DGPO: Discovering Multiple Strategies With Diversity-guided Policy Optimization (2022)2.26
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- A Generalized Training Approach For Multiagent Learning (2019)0.00
- Policy Gradient From Demonstration And Curiosity (2020)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00