STAS: Spatial-temporal Return Decomposition For Multi-agent Reinforcement Learning
2023 Β· Sirui Chen, Zhaowei Zhang, Yaodong Yang, et al.
Abstract
Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is credit assignment, which aims to credit agents by their contributions. While prior studies have shown great success, their methods typically fail to work in episodic reinforcement learning scenarios where global rewards are revealed only at the end of the episode. They lack the functionality to model complicated relations of the delayed global reward in the temporal dimension and suffer from inefficiencies. To tackle this, we introduce Spatial-Temporal Attention with Shapley (STAS), a novel method that learns credit assignment in both temporal and spatial dimensions. It first decomposes the global return back to each time step, then utilizes the Shapley Value to redistribute the individual payoff from the decomposed global reward. To mitigate the computational complexity of the Shapley Value, we int
Authors
(none)
Tags
Stats
Related papers
- Shapley Counterfactual Credits For Multi-agent Reinforcement Learning (2021)12.40
- Agent-temporal Credit Assignment For Optimal Policy Preservation In Sparse Multi-agent Reinforcement Learning (2024)0.00
- CTDS: Centralized Teacher With Decentralized Student For Multi-agent Reinforcement Learning (2022)0.00
- Agent-time Attention For Sparse Rewards Multi-agent Reinforcement Learning (2022)0.00
- Agent-temporal Attention For Reward Redistribution In Episodic Multi-agent Reinforcement Learning (2022)2.26
- GTDE: Grouped Training With Decentralized Execution For Multi-agent Actor-critic (2024)3.58
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Credit Assignment With Meta-policy Gradient For Multi-agent Reinforcement Learning (2021)0.00