Hybrid Value Estimation For Off-policy Evaluation And Offline Reinforcement Learning
2022 Β· Xue-Kun Jin, Xu-Hui Liu, Shengyi Jiang, et al.
Abstract
Value function estimation is an indispensable subroutine in reinforcement learning, which becomes more challenging in the offline setting. In this paper, we propose Hybrid Value Estimation (HVE) to reduce value estimation error, which trades off bias and variance by balancing between the value estimation from offline data and the learned model. Theoretical analysis discloses that HVE enjoys a better error bound than the direct methods. HVE can be leveraged in both off-policy evaluation and offline reinforcement learning settings. We, therefore, provide two concrete algorithms Off-policy HVE (OPHVE) and Model-based Offline HVE (MOHVE), respectively. Empirical evaluations on MuJoCo tasks corroborate the theoretical claim. OPHVE outperforms other off-policy evaluation methods in all three metrics measuring the estimation effectiveness, while MOHVE achieves better or comparable performance with state-of-the-art offline reinforcement learning algorithms. We hope that HVE could shed some lig
Authors
(none)
Tags
Stats
Related papers
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- High-confidence Error Estimates For Learned Value Functions (2018)0.00
- Variance-aware Off-policy Evaluation With Linear Function Approximation (2021)0.00
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58
- Towards Hyperparameter-free Policy Selection For Offline Reinforcement Learning (2021)0.00
- Unifying Gradient Estimators For Meta-reinforcement Learning Via Off-policy Evaluation (2021)0.00
- Statistically Efficient Variance Reduction With Double Policy Estimation For Off-policy Evaluation In Sequence-modeled Reinforcement Learning (2023)0.00