Towards Global Optimality In Cooperative MARL With The Transformation And Distillation Framework
2022 Β· Jianing Ye, Chenghao Li, Yongqiang Dou, et al.
Abstract
Decentralized execution is one core demand in multi-agent reinforcement learning (MARL). Recently, most popular MARL algorithms have adopted decentralized policies to enable decentralized execution, and use gradient descent as the optimizer. However, there is hardly any theoretical analysis of these algorithms taking the optimization method into consideration, and we find that various popular MARL algorithms with decentralized policies are suboptimal in toy tasks when gradient descent is chosen as their optimization method. In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods, and prove their suboptimality when gradient descent is used. To address the suboptimality issue, we propose the Transformation And Distillation (TAD) framework, which reformulates a multi-agent MDP as a special single-agent MDP with a sequential structure and enables decentralized execution by di
Authors
(none)
Tags
Stats
Related papers
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Is Centralized Training With Decentralized Execution Framework Centralized Enough For MARL? (2023)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00
- F2A2: Flexible Fully-decentralized Approximate Actor-critic For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00
- Multi-agent Guided Policy Optimization (2025)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61