Sample Complexity Of Robust Reinforcement Learning With A Generative Model
2021 Β· Kishan Panaganti, Dileep Kalathil
Abstract
The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an \(\epsilon\)-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorit
Authors
(none)
Tags
Stats
Related papers
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees (2020)0.00
- Sample Complexity Of Offline Distributionally Robust Linear Markov Decision Processes (2024)0.00
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00
- A Bayesian Approach To Robust Reinforcement Learning (2019)0.00
- Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data (2024)0.00
- Sample-efficient Robust Multi-agent Reinforcement Learning In The Face Of Environmental Uncertainty (2024)0.00