Heterogeneous Multi-agent Reinforcement Learning Via Mirror Descent Policy Optimization
2023 Β· Mohammad Mehdi Nasiri, Mansoor Rezghi
Abstract
This paper presents an extension of the Mirror Descent method to overcome challenges in cooperative Multi-Agent Reinforcement Learning (MARL) settings, where agents have varying abilities and individual policies. The proposed Heterogeneous-Agent Mirror Descent Policy Optimization (HAMDPO) algorithm utilizes the multi-agent advantage decomposition lemma to enable efficient policy updates for each agent while ensuring overall performance improvements. By iteratively updating agent policies through an approximate solution of the trust-region problem, HAMDPO guarantees stability and improves performance. Moreover, the HAMDPO algorithm is capable of handling both continuous and discrete action spaces for heterogeneous agents in various MARL problems. We evaluate HAMDPO on Multi-Agent MuJoCo and StarCraftII tasks, demonstrating its superiority over state-of-the-art algorithms such as HATRPO and HAPPO. These results suggest that HAMDPO is a promising approach for solving cooperative MARL prob
Authors
(none)
Tags
Stats
Related papers
- Heterogeneous-agent Mirror Learning: A Continuum Of Solutions To Cooperative MARL (2022)0.00
- Heterogeneous-agent Reinforcement Learning (2023)0.00
- Trust Region Policy Optimisation In Multi-agent Reinforcement Learning (2021)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61
- Maximum Entropy Heterogeneous-agent Reinforcement Learning (2023)0.00
- Heterogeneous Multi-robot Reinforcement Learning (2023)6.77
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00