Policy Gradient Method For Robust Reinforcement Learning
2022 Β· Yue Wang, Shaofeng Zou
Abstract
This paper develops the first policy gradient method with global optimality guarantee and complexity analysis for robust reinforcement learning under model mismatch. Robust reinforcement learning is to learn a policy robust to model mismatch between simulator and real environment. We first develop the robust policy (sub-)gradient, which is applicable for any differentiable parametric policy class. We show that the proposed robust policy gradient method converges to the global optimum asymptotically under direct policy parameterization. We further develop a smoothed robust policy gradient method and show that to achieve an \(\epsilon\)-global optimum, the complexity is \(\mathcal O(\epsilon^\{-3\})\). We then extend our methodology to the general model-free setting and design the robust actor-critic method with differentiable parametric policy class and value function. We further characterize its asymptotic convergence and sample complexity under the tabular setting. Finally, we provide
Authors
(none)
Tags
Stats
Related papers
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Analysis Of On-policy Policy Gradient Methods Under The Distribution Mismatch (2025)0.00
- Reinforcement Learning Under Model Mismatch (2017)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- On The Global Optimality Of Policy Gradient Methods In General Utility Reinforcement Learning (2024)0.00