Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning
2024 Β· Xudong Yu, Chenjia Bai, Hongyi Guo, et al.
Abstract
Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions. To address this, existing uncertainty-based methods penalize the value function with uncertainty quantification and demand numerous ensemble networks, posing computational challenges and suboptimal outcomes. In this paper, we introduce a novel strategy employing diverse randomized value functions to estimate the posterior distribution of \(Q\)-values. It provides robust uncertainty quantification and estimates lower confidence bounds (LCB) of \(Q\)-values. By applying moderate value penalties for OOD actions, our method fosters a provably pessimistic approach. We also emphasize on diversity within randomized value functions and enhance efficiency by introducing a diversity regularization method, reducing the requisite number of networks. These modules lead to reliable value estimation and efficient policy learning from offline data. Theoretic
Authors
(none)
Tags
Stats
Related papers
- Confidence-conditioned Value Functions For Offline Reinforcement Learning (2022)0.00
- Uncertainty-based Offline Reinforcement Learning With Diversified Q-ensemble (2021)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- Viper: Provably Efficient Algorithm For Offline RL With Neural Function Approximation (2023)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Why So Pessimistic? Estimating Uncertainties For Offline RL Through Ensembles, And Why Their Independence Matters (2022)6.77
- Pessimistic Nonlinear Least-squares Value Iteration For Offline Reinforcement Learning (2023)0.00