Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model
2022 Β· Jialian Li, Tongzheng Ren, Dong Yan, et al.
Abstract
In high-stake scenarios like medical treatment and auto-piloting, it's risky or even infeasible to collect online experimental data to train the agent. Simulation-based training can alleviate this issue, but may suffer from its inherent mismatches from the simulator and real environment. It is therefore imperative to utilize the simulator to learn a robust policy for the real-world deployment. In this work, we consider policy learning for Robust Markov Decision Processes (RMDP), where the agent tries to seek a robust policy with respect to unexpected perturbations on the environments. Specifically, we focus on the setting where the training environment can be characterized as a generative model and a constrained perturbation can be added to the model during testing. Our goal is to identify a near-optimal robust policy for the perturbed testing environment, which introduces additional technical difficulties as we need to simultaneously estimate the training environment uncertainty from
Authors
(none)
Tags
Stats
Related papers
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Robust Lagrangian And Adversarial Policy Gradient For Robust Constrained Markov Decision Processes (2023)2.26
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees (2020)0.00
- Robust Batch Policy Learning In Markov Decision Processes (2020)0.00
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00
- Model-free Robust \(\phi\)-divergence Reinforcement Learning Using Both Offline And Online Data (2024)0.00