Mastering The Game Of No-press Diplomacy Via Human-regularized Reinforcement Learning And Planning
2022 Β· Anton Bakhtin, David J Wu, Adam Lerer, et al.
Abstract
No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function. We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model. We used RL-DiL-piKL to train an agent we name Diplodocus. In a 200-game no-press Diplomacy tournament involving 62 human p
Authors
(none)
Tags
Stats
Related papers
- Learning To Play No-press Diplomacy With Best Response Policy Iteration (2020)0.00
- Modeling Strong And Human-like Gameplay With Kl-regularized Search (2021)0.00
- Human-ai Coordination Via Human-regularized Search And Learning (2022)0.00
- A Human Mixed Strategy Approach To Deep Reinforcement Learning (2018)7.50
- Human-level Reinforcement Learning Through Theory-based Modeling, Exploration, And Planning (2021)0.00
- Learning Multiagent Coordination In The Absence Of Communication Channels (2018)0.00
- Towards Cooperation In Sequential Prisoner's Dilemmas: A Deep Multiagent Reinforcement Learning Approach (2018)0.00
- Reinforcement Learning On Human Decision Models For Uniquely Collaborative AI Teammates (2021)0.00