Dealing With Non-stationarity In MARL Via Trust-region Decomposition
2021 Β· Wenhao Li, Xiangfeng Wang, Bo Jin, et al.
Abstract
Non-stationarity is one thorny issue in cooperative multi-agent reinforcement learning (MARL). One of the reasons is the policy changes of agents during the learning process. Some existing works have discussed various consequences caused by non-stationarity with several kinds of measurement indicators. This makes the objectives or goals of existing algorithms are inevitably inconsistent and disparate. In this paper, we introduce a novel notion, the \(\delta\)-measurement, to explicitly measure the non-stationarity of a policy sequence, which can be further proved to be bounded by the KL-divergence of consecutive joint policies. A straightforward but highly non-trivial way is to control the joint policies' divergence, which is difficult to estimate accurately by imposing the trust-region constraint on the joint policy. Although it has lower computational complexity to decompose the joint policy and impose trust-region constraints on the factorized policies, simple policy factorization l
Authors
(none)
Tags
Stats
Related papers
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Trust Region Bounds For Decentralized PPO Under Non-stationarity (2022)0.00
- Non-stationary Policy Learning For Multi-timescale Multi-agent Reinforcement Learning (2023)5.24
- Trust Region Policy Optimisation In Multi-agent Reinforcement Learning (2021)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Remembering The Markov Property In Cooperative MARL (2025)0.00
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00