Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees
2020 Β· Kishan Panaganti, Dileep Kalathil
Abstract
This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Robust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some standard benchmark problems.
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Robust Lagrangian And Adversarial Policy Gradient For Robust Constrained Markov Decision Processes (2023)2.26
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data (2024)0.00
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- A Bayesian Approach To Robust Reinforcement Learning (2019)0.00