Pessimistic Nonlinear Least-squares Value Iteration For Offline Reinforcement Learning
2023 Β· Qiwei di, Heyang Zhao, Jiafan He, et al.
Abstract
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based on the data collected by a behavior policy, has attracted increasing attention in recent years. While offline RL with linear function approximation has been extensively studied with optimal results achieved under certain assumptions, many works shift their interest to offline RL with non-linear function approximation. However, limited works on offline RL with non-linear function approximation have instance-dependent regret guarantees. In this paper, we propose an oracle-efficient algorithm, dubbed Pessimistic Nonlinear Least-Square Value Iteration (PNLSVI), for offline RL with non-linear function approximation. Our algorithmic design comprises three innovative components: (1) a variance-based weighted regression scheme that can be applied to a wide range of function classes, (2) a subroutine for variance estimation, and (3) a planning phase that utilizes a pessimistic value iteration approach. O
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Offline Reinforcement Learning With Linear Representation: Leveraging Variance Information With Pessimism (2022)0.00
- Viper: Provably Efficient Algorithm For Offline RL With Neural Function Approximation (2023)0.00
- Is Pessimism Provably Efficient For Offline RL? (2020)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- POPO: Pessimistic Offline Policy Optimization (2020)5.24
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- Pessimistic Bootstrapping For Uncertainty-driven Offline Reinforcement Learning (2022)0.00
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58