Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity
2022 Β· Laixi Shi, Yuejie Chi
Abstract
This paper concerns the central issues of model robustness and sample efficiency in offline reinforcement learning (RL), which aims to learn to perform decision making from history data without active exploration. Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy -- with as few samples as possible -- that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset. We consider a distributionally robust formulation of offline RL, focusing on tabular robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings. To combat with sample scarcity, a model-based algorithm that combines distributionally robust value iteration with the principle of pessimism in the face of uncertainty is proposed, by penalizing the robust value estimates with a carefully designed data-driven penalty term. Under a mild
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Offline Distributionally Robust Linear Markov Decision Processes (2024)0.00
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00
- Double Pessimism Is Provably Efficient For Distributionally Robust Offline Reinforcement Learning: Generic Algorithm And Robust Partial Coverage (2023)0.00
- Minimax Optimal And Computationally Efficient Algorithms For Distributionally Robust Offline Reinforcement Learning (2024)0.00
- On The Sample Complexity Of Vanilla Model-based Offline Reinforcement Learning With Dependent Samples (2023)2.26
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00