Efficient Competitive Self-play Policy Optimization
2020 Β· Yuanyi Zhong, Yuan Zhou, Jian Peng
Abstract
Reinforcement learning from self-play has recently reported many successes. Self-play, where the agents compete with themselves, is often used to generate training data for iterative policy improvement. In previous work, heuristic rules are designed to choose an opponent for the current learner. Typical rules include choosing the latest agent, the best agent, or a random historical agent. However, these rules may be inefficient in practice and sometimes do not guarantee convergence even in the simplest matrix games. In this paper, we propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. We recognize the fact that the Nash equilibrium coincides with the saddle point of the stochastic payoff function, which motivates us to borrow ideas from classical saddle point optimization literature. Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules derived from
Authors
(none)
Tags
Stats
Related papers
- Fictitious Cross-play: Learning Global Nash Equilibrium In Mixed Cooperative-competitive Games (2023)3.58
- A Sharp Analysis Of Model-based Reinforcement Learning With Self-play (2020)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- A Minimaximalist Approach To Reinforcement Learning From Human Feedback (2024)0.00
- All By Myself: Learning Individualized Competitive Behaviour With A Contrastive Reinforcement Learning Optimization (2023)7.16
- Provably Efficient Fictitious Play Policy Optimization For Zero-sum Markov Games With Structured Transitions (2022)0.00
- Efficient Use Of Heuristics For Accelerating Xcs-based Policy Learning In Markov Games (2020)0.00
- Offline Fictitious Self-play For Competitive Games (2024)0.00