Plan Better Amid Conservatism: Offline Multi-agent Reinforcement Learning With Actor Rectification
2021 Β· Ling Pan, Longbo Huang, Tengyu Ma, et al.
Abstract
Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets. However, as many real-world scenarios involve interaction among multiple agents, it is important to resolve offline RL in the multi-agent setting. Given the recent success of transferring online RL algorithms to the multi-agent setting, one may expect that offline RL algorithms will also transfer to multi-agent settings directly. Surprisingly, we empirically observe that conservative offline RL algorithms do not work well in the multi-agent setting -- the performance degrades significantly with an increasing number of agents. Towards mitigating the degradation, we identify a key issue that non-concavity of the value function makes the policy gradient improvements prone to local optima. Multiple agents exacerbate the problem severely, since the suboptimal policy by any agent can lead to uncoordinated global failure. Following this intuition, we propose a
Authors
(none)
Tags
Stats
Related papers
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Long-horizon Model-based Offline Reinforcement Learning Without Conservatism (2025)0.00
- Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning (2026)0.00
- Compositional Conservatism: A Transductive Approach In Offline Reinforcement Learning (2024)1.81
- Believe What You See: Implicit Constraint Approach For Offline Multi-agent Reinforcement Learning (2021)0.00
- Optimal Conservative Offline RL With General Function Approximation Via Augmented Lagrangian (2022)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- Offline Decentralized Multi-agent Reinforcement Learning (2021)7.50