Remix: Regret Minimization For Monotonic Value Function Factorization In Multiagent Reinforcement Learning
2023 Β· Yongsheng Mei, Hanhan Zhou, Tian Lan
Abstract
Value function factorization methods have become a dominant approach for cooperative multiagent reinforcement learning under a centralized training and decentralized execution paradigm. By factorizing the optimal joint action-value function using a monotonic mixing function of agents' utilities, these algorithms ensure the consistency between joint and local action selections for decentralized decision-making. Nevertheless, the use of monotonic mixing functions also induces representational limitations. Finding the optimal projection of an unrestricted mixing function onto monotonic function classes is still an open problem. To this end, we propose ReMIX, formulating this optimal projection problem for value function factorization as a regret minimization over the projection weights of different state-action values. Such an optimization problem can be relaxed and solved using the Lagrangian multiplier method to obtain the close-form optimal projection weights. By minimizing the resulti
Authors
(none)
Tags
Stats
Related papers
- QMIX: Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2018)0.00
- Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- Concaveq: Non-monotonic Value Function Factorization Via Concave Representations In Deep Multi-agent Reinforcement Learning (2023)5.84
- NQMIX: Non-monotonic Value Function Factorization For Deep Multi-agent Reinforcement Learning (2021)0.00
- Weighted QMIX: Expanding Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- Beyond Monotonicity: Revisiting Factorization Principles In Multi-agent Q-learning (2025)0.00
- POWQMIX: Weighted Value Factorization With Potentially Optimal Joint Actions Recognition For Cooperative Multi-agent Reinforcement Learning (2024)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00