Model-based Multi-agent Policy Optimization With Adaptive Opponent-wise Rollouts
2021 Β· Weinan Zhang, Xihuai Wang, Jian Shen, et al.
Abstract
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.
Authors
(none)
Tags
Stats
Related papers
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Learning To Model Opponent Learning (2020)0.00
- Model-based Multi-agent Reinforcement Learning: Recent Progress And Prospects (2022)0.00
- Offline Multi-agent Reinforcement Learning Via In-sample Sequential Policy Optimization (2024)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- Adaptive Opponent Policy Detection In Multi-agent Mdps: Real-time Strategy Switch Identification Using Running Error Estimation (2024)0.00