Mildly Conservative Q-learning For Offline Reinforcement Learning
2022 Β· Jiafei Lyu, Xiaoteng Ma, Xiu Li, et al.
Abstract
Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD actions are actively trained by assigning them proper pseudo Q values. We theoretically show that MCQ induces a policy that behaves at least as well as the behavior policy and no erroneous overestimation will occur for OOD actions. Experimental resul
Authors
(none)
Tags
Stats
Related papers
- ACL-QL: Adaptive Conservative Level In Q-learning For Offline Reinforcement Learning (2024)0.00
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Constraints Penalized Q-learning For Safe Offline Reinforcement Learning (2021)0.00
- DOMAIN: Mildly Conservative Model-based Offline Reinforcement Learning (2023)0.00
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- Uncertainty-based Offline Reinforcement Learning With Diversified Q-ensemble (2021)0.00
- MICRO: Model-based Offline Reinforcement Learning With A Conservative Bellman Operator (2023)0.00