Near-optimal Offline Reinforcement Learning With Linear Representation: Leveraging Variance Information With Pessimism
2022 Β· Ming Yin, Yaqi Duan, Mengdi Wang, et al.
Abstract
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize sequential decision-making strategies, has gained surging prominence in recent studies. Due to the advantage that appropriate function approximators can help mitigate the sample complexity burden in modern reinforcement learning problems, existing endeavors usually enforce powerful function representation models (e.g. neural networks) to learn the optimal policies. However, a precise understanding of the statistical limits with function representations, remains elusive, even when such a representation is linear. Towards this goal, we study the statistical limits of offline reinforcement learning with linear model representations. To derive the tight offline learning bound, we design the variance-aware pessimistic value iteration (VAPVI), which adopts the conditional variance information of the value function for time-inhomogeneous episodic linear Markov decision processes (MDPs). VAPVI leverage
Authors
(none)
Tags
Stats
Related papers
- Is Pessimism Provably Efficient For Offline RL? (2020)0.00
- Pessimistic Nonlinear Least-squares Value Iteration For Offline Reinforcement Learning (2023)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00
- Viper: Provably Efficient Algorithm For Offline RL With Neural Function Approximation (2023)0.00
- Bellman-consistent Pessimism For Offline Reinforcement Learning (2021)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34