Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data
2024 Β· Kishan Panaganti, Adam Wierman, Eric Mazumdar
Abstract
The robust \(\phi\)-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust \(\phi\)-regularized fitted Q-iteration (RPQ) for learning an \(\epsilon\)-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of \(\phi\)-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust \(\phi\)-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-fre
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees (2020)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- A Bayesian Approach To Robust Reinforcement Learning (2019)0.00
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00
- On The Foundation Of Distributionally Robust Reinforcement Learning (2023)0.00
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00