Confidence-conditioned Value Functions For Offline Reinforcement Learning
2022 Β· Joey Hong, Aviral Kumar, Sergey Levine
Abstract
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using existing, static datasets, without any costly online interaction. To do so, offline RL methods must handle distributional shift between the dataset and the learned policy. The most common approach is to learn conservative, or lower-bound, value functions, which underestimate the return of out-of-distribution (OOD) actions. However, such methods exhibit one notable drawback: policies optimized on such value functions can only behave according to a fixed, possibly suboptimal, degree of conservatism. However, this can be alleviated if we instead are able to learn policies for varying degrees of conservatism at training time and devise a method to dynamically choose one of them during evaluation. To do so, in this work, we propose learning value functions that additionally condition on the degree of conservatism, which we dub confidence-conditioned value functions. We derive a new form of a Be
Authors
(none)
Tags
Stats
Related papers
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Optimal Conservative Offline RL With General Function Approximation Via Augmented Lagrangian (2022)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Compositional Conservatism: A Transductive Approach In Offline Reinforcement Learning (2024)1.81