QR-MIX: Distributional Value Function Factorisation For Cooperative Multi-agent Reinforcement Learning
2020 Β· Jian Hu, Seth Austin Harding, Haibin Wu, et al.
Abstract
In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness
Authors
(none)
Tags
Stats
Related papers
- Weighted QMIX: Expanding Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- QMIX: Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2018)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- NQMIX: Non-monotonic Value Function Factorization For Deep Multi-agent Reinforcement Learning (2021)0.00
- Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- Smix(\(\lambda\)): Enhancing Centralized Value Functions For Cooperative Multi-agent Reinforcement Learning (2019)8.60
- RMIX: Learning Risk-sensitive Policies For Cooperative Reinforcement Learning Agents (2021)0.00
- POWQMIX: Weighted Value Factorization With Potentially Optimal Joint Actions Recognition For Cooperative Multi-agent Reinforcement Learning (2024)0.00