Value Function Decomposition For Iterative Design Of Reinforcement Learning Agents
2022 Β· James MacGlashan, Evan Archer, Alisa Devlic, et al.
Abstract
Designing reinforcement learning (RL) agents is typically a difficult process that requires numerous design iterations. Learning can fail for a multitude of reasons, and standard RL methods provide too few tools to provide insight into the exact cause. In this paper, we show how to integrate value decomposition into a broad class of actor-critic algorithms and use it to assist in the iterative agent-design process. Value decomposition separates a reward function into distinct components and learns value estimates for each. These value estimates provide insight into an agent's learning and decision-making process and enable new training methods to mitigate common problems. As a demonstration, we introduce SAC-D, a variant of soft actor-critic (SAC) adapted for value decomposition. SAC-D maintains similar performance to SAC, while learning a larger set of value predictions. We also introduce decomposition-based tools that exploit this information, including a new reward influence metric,
Authors
(none)
Tags
Stats
Related papers
- SVDE: Scalable Value-decomposition Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Understanding Value Decomposition Algorithms In Deep Cooperative Multi-agent Reinforcement Learning (2022)0.00
- Decomposed Soft Actor-critic Method For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Adaptive Value Decomposition With Greedy Marginal Contribution Computation For Cooperative Multi-agent Reinforcement Learning (2023)3.58
- Dual Self-awareness Value Decomposition Framework Without Individual Global Max For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77
- Disentangling Dynamics And Returns: Value Function Decomposition With Future Prediction (2019)0.00