More Centralized Training, Still Decentralized Execution: Multi-agent Conditional Policy Factorization
2022 Β· Jiangxing Wang, Deheng Ye, Zongqing Lu
Abstract
In cooperative multi-agent reinforcement learning (MARL), combining value decomposition with actor-critic enables agents to learn stochastic policies, which are more suitable for the partially observable environment. Given the goal of learning local policies that enable decentralized execution, agents are commonly assumed to be independent of each other, even in centralized training. However, such an assumption may prohibit agents from learning the optimal joint policy. To address this problem, we explicitly take the dependency among agents into centralized training. Although this leads to the optimal joint policy, it may not be factorized for decentralized execution. Nevertheless, we theoretically show that from such a joint policy, we can always derive another joint policy that achieves the same optimality but can be factorized for decentralized execution. To this end, we propose multi-agent conditional policy factorization (MACPF), which takes more centralized training but still ena
Authors
(none)
Tags
Stats
Related papers
- Agentmixer: Multi-agent Correlated Policy Factorization (2024)0.00
- Qfree: A Universal Value Function Factorization For Multi-agent Reinforcement Learning (2023)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- PAC: Assisted Value Factorisation With Counterfactual Predictions In Multi-agent Reinforcement Learning (2022)0.00
- Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- Towards Understanding Cooperative Multi-agent Q-learning With Value Factorization (2020)0.00
- Beyond Monotonicity: Revisiting Factorization Principles In Multi-agent Q-learning (2025)0.00
- Is Centralized Training With Decentralized Execution Framework Centralized Enough For MARL? (2023)0.00