SHAQ: Incorporating Shapley Value Theory Into Multi-agent Q-learning
2021 Β· Jianhong Wang, Yuan Zhang, Yunjie Gu, et al.
Abstract
Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship
Authors
(none)
Tags
Stats
Related papers
- Shapley Q-value: A Local Reward Approach To Solve Global Reward Games (2019)13.65
- Towards Understanding Cooperative Multi-agent Q-learning With Value Factorization (2020)0.00
- Collective Explainable AI: Explaining Cooperative Strategies And Agent Contribution In Multiagent Reinforcement Learning With Shapley Values (2021)0.00
- A Theoretical Framework For Explaining Reinforcement Learning With Shapley Values (2025)0.00
- Explaining Reinforcement Learning With Shapley Values (2023)0.00
- The Shapley Value In Machine Learning (2022)17.35
- DFAC Framework: Factorizing The Value Function Via Quantile Mixture For Multi-agent Distributional Q-learning (2021)0.00
- Beyond Monotonicity: Revisiting Factorization Principles In Multi-agent Q-learning (2025)0.00