Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data

Abstract

The robust \(\phi\)-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust \(\phi\)-regularized fitted Q-iteration (RPQ) for learning an \(\epsilon\)-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of \(\phi\)-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust \(\phi\)-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-fre

Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data

Abstract

Authors

Tags

Stats

Related papers