Multi-agent Trust Region Policy Optimization
2020 Β· Hepeng Li, Haibo He
Abstract
We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.
Authors
(none)
Tags
Stats
Related papers
- Trust Region Policy Optimisation In Multi-agent Reinforcement Learning (2021)0.00
- Trust Region Bounds For Decentralized PPO Under Non-stationarity (2022)0.00
- Adaptive Trust Region Policy Optimization: Global Convergence And Faster Rates For Regularized Mdps (2019)12.10
- Multi-agent Constrained Policy Optimisation (2021)0.00
- Cooperative Multi-agent Reinforcement Learning With Partial Observations (2020)10.35
- Multi-agent Guided Policy Optimization (2025)0.00
- Heterogeneous Multi-agent Reinforcement Learning Via Mirror Descent Policy Optimization (2023)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00