PAC: Assisted Value Factorisation With Counterfactual Predictions In Multi-agent Reinforcement Learning
2022 Β· Hanhan Zhou, Tian Lan, Vaneet Aggarwal
Abstract
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities due to monotonicity. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation error during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also der
Authors
(none)
Tags
Stats
Related papers
- More Centralized Training, Still Decentralized Execution: Multi-agent Conditional Policy Factorization (2022)0.00
- A Unified Framework For Factorizing Distributional Value Functions For Multi-agent Reinforcement Learning (2023)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Inducing Cooperation Via Team Regret Minimization Based Multi-agent Deep Reinforcement Learning (2019)0.00
- Residual Q-networks For Value Function Factorizing In Multi-agent Reinforcement Learning (2022)10.21
- DFAC Framework: Factorizing The Value Function Via Quantile Mixture For Multi-agent Distributional Q-learning (2021)0.00
- Qfree: A Universal Value Function Factorization For Multi-agent Reinforcement Learning (2023)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00