Abstract

The robust \(\phi\)-regularized Markov Decision Process (RRMDP) framework focuses on designing control policies that are robust against parameter uncertainties due to mismatches between the simulator (nominal) model and real-world settings. This work makes two important contributions. First, we propose a model-free algorithm called Robust \(\phi\)-regularized fitted Q-iteration (RPQ) for learning an \(\epsilon\)-optimal robust policy that uses only the historical data collected by rolling out a behavior policy (with robust exploratory requirement) on the nominal model. To the best of our knowledge, we provide the first unified analysis for a class of \(\phi\)-divergences achieving robust optimal policies in high-dimensional systems with general function approximation. Second, we introduce the hybrid robust \(\phi\)-regularized reinforcement learning framework to learn an optimal robust policy using both historical data and online sampling. Towards this framework, we propose a model-fre

Authors

(none)

Tags

  • Model-Based RL

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keypanaganti2024model

Related papers