Offline Decentralized Multi-agent Reinforcement Learning
2021 Β· Jiechuan Jiang, Zongqing Lu
Abstract
In many real-world multi-agent cooperative tasks, due to high cost and risk, agents cannot continuously interact with the environment and collect experiences during learning, but have to learn from offline datasets. However, the transition dynamics in the dataset of each agent can be much different from the ones induced by the learned policies of other agents in execution, creating large errors in value estimates. Consequently, agents learn uncoordinated low-performing policies. In this paper, we propose a framework for offline decentralized multi-agent reinforcement learning, which exploits value deviation and transition normalization to deliberately modify the transition probabilities. Value deviation optimistically increases the transition probabilities of high-value next states, and transition normalization normalizes the transition probabilities of next states. They together enable agents to learn high-performing and coordinated policies. Theoretically, we prove the convergence of
Authors
(none)
Tags
Stats
Related papers
- Plan Better Amid Conservatism: Offline Multi-agent Reinforcement Learning With Actor Rectification (2021)0.00
- Multi-agent Fully Decentralized Value Function Learning With Linear Convergence Rates (2018)10.21
- Offline Multi-agent Reinforcement Learning With Implicit Global-to-local Value Regularization (2023)5.84
- Madiff: Offline Multi-agent Learning With Diffusion Models (2023)2.26
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning (2026)0.00
- Mean-field Multi-agent Reinforcement Learning: A Decentralized Network Approach (2021)0.00
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00