Fictitious Cross-play: Learning Global Nash Equilibrium In Mixed Cooperative-competitive Games
2023 Β· Zelai Xu, Yancheng Liang, Chao Yu, et al.
Abstract
Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard
Authors
(none)
Tags
Stats
Related papers
- Efficient Competitive Self-play Policy Optimization (2020)0.00
- A Generalized Training Approach For Multiagent Learning (2019)0.00
- Offline Fictitious Self-play For Competitive Games (2024)0.00
- Role Play: Learning Adaptive Role-specific Strategies In Multi-agent Interactions (2024)0.00
- Learning Equilibria In Mean-field Games: Introducing Mean-field PSRO (2021)0.00
- Multi-agent Training Beyond Zero-sum With Correlated Equilibrium Meta-solvers (2021)0.00
- Generalized Beliefs For Cooperative AI (2022)0.00
- Neural Auto-curricula (2021)0.00