Achieving The Asymptotically Optimal Sample Complexity Of Offline Reinforcement Learning: A Dro-based Approach
2023 Β· Yue Wang, Jinjun Xiong, Shaofeng Zou
Abstract
Offline reinforcement learning aims to learn from pre-collected datasets without active exploration. This problem faces significant challenges, including limited data availability and distributional shifts. Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively. In this paper, we show that the distributionally robust optimization (DRO) based approach can also address these challenges and is \{asymptotically minimax optimal\}. Specifically, we directly model the uncertainty in the transition kernel and construct an uncertainty set of statistically plausible transition kernels. We then show that the policy that optimizes the worst-case performance over this uncertainty set has a near-optimal performance in the underlying problem. We first design a metric-based distribution-based uncertainty set such that with high probability the true transition kernel is in this set. We prov
Authors
(none)
Tags
Stats
Related papers
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Sample Complexity Of Offline Distributionally Robust Linear Markov Decision Processes (2024)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00
- Minimax Optimal And Computationally Efficient Algorithms For Distributionally Robust Offline Reinforcement Learning (2024)0.00
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00
- Sample Complexity Of Offline Reinforcement Learning With Deep Relu Networks (2021)0.00