Achieving The Asymptotically Optimal Sample Complexity Of Offline Reinforcement Learning: A Dro-based Approach

Abstract

Offline reinforcement learning aims to learn from pre-collected datasets without active exploration. This problem faces significant challenges, including limited data availability and distributional shifts. Existing approaches adopt a pessimistic stance towards uncertainty by penalizing rewards of under-explored state-action pairs to estimate value functions conservatively. In this paper, we show that the distributionally robust optimization (DRO) based approach can also address these challenges and is \{asymptotically minimax optimal\}. Specifically, we directly model the uncertainty in the transition kernel and construct an uncertainty set of statistically plausible transition kernels. We then show that the policy that optimizes the worst-case performance over this uncertainty set has a near-optimal performance in the underlying problem. We first design a metric-based distribution-based uncertainty set such that with high probability the true transition kernel is in this set. We prov

Achieving The Asymptotically Optimal Sample Complexity Of Offline Reinforcement Learning: A Dro-based Approach

Abstract

Authors

Tags

Stats

Related papers