Agentmixer: Multi-agent Correlated Policy Factorization
2024 Β· Zhiyuan Li, Wenshuai Zhao, Lijun Wu, et al.
Abstract
In multi-agent reinforcement learning, centralized training with decentralized execution (CTDE) methods typically assume that agents make decisions based on their local observations independently, which may not lead to a correlated joint policy with coordination. Coordination can be explicitly encouraged during training and individual policies can be trained to imitate the correlated joint policy. However, this may lead to an \textit\{asymmetric learning failure\} due to the observation mismatch between the joint and individual policies. Inspired by the concept of correlated equilibrium, we introduce a \textit\{strategy modification\} called AgentMixer that allows agents to correlate their policies. AgentMixer combines individual partially observable policies into a joint fully observable policy non-linearly. To enable decentralized execution, we introduce \textit\{Individual-Global-Consistency\} to guarantee mode consistency during joint training of the centralized and decentralized p
Authors
(none)
Tags
Stats
Related papers
- More Centralized Training, Still Decentralized Execution: Multi-agent Conditional Policy Factorization (2022)0.00
- QMIX: Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2018)0.00
- Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Comix: A Multi-agent Reinforcement Learning Training Architecture For Efficient Decentralized Coordination And Independent Decision-making (2023)0.00
- Multi-agent Interactions Modeling With Correlated Policies (2020)2.60
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00
- RMIX: Learning Risk-sensitive Policies For Cooperative Reinforcement Learning Agents (2021)0.00