Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning
2023 Β· Jianzhun Shao, Yun Qu, Chen Chen, et al.
Abstract
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distribution shift issue common in offline setting and the high dimension issue common in multi-agent setting, making the action out-of-distribution (OOD) and value overestimation phenomenon excessively severe. Tomitigate this problem, we propose a novel multi-agent offline RL algorithm, named CounterFactual Conservative Q-Learning (CFCQL) to conduct conservative value estimation. Rather than regarding all the agents as a high dimensional single one and directly applying single agent methods to it, CFCQL calculates conservative regularization for each agent separately in a counterfactual way and then linearly combines them to realize an overall conservative value estimation. We prove that it still enjoys the underestimation property and the performance guarantee as those single agent conservative methods do, but the induced regularization and safe policy improvement bound are independent of the
Authors
(none)
Tags
Stats
Related papers
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Plan Better Amid Conservatism: Offline Multi-agent Reinforcement Learning With Actor Rectification (2021)0.00
- Believe What You See: Implicit Constraint Approach For Offline Multi-agent Reinforcement Learning (2021)0.00
- ACL-QL: Adaptive Conservative Level In Q-learning For Offline Reinforcement Learning (2024)0.00
- Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning (2026)0.00
- Federated Offline Reinforcement Learning: Collaborative Single-policy Coverage Suffices (2024)0.00
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58