Neural Policy Gradient Methods: Global Optimality And Rates Of Convergence
2019 Β· Lingxiao Wang, Qi Cai, Zhuoran Yang, et al.
Abstract
Policy gradient methods with actor-critic schemes demonstrate tremendous empirical successes, especially when the actors and critics are parameterized by neural networks. However, it remains less clear whether such "neural" policy gradient methods converge to globally optimal policies and whether they even converge at all. We answer both the questions affirmatively in the overparameterized regime. In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point. Meanwhile, by relating the suboptimality of the stationary points to the representation power of neural actor and critic classes, we prove the global optimality of all stationary points under mild regularity conditions. Particularly, we show that a key to the global optimality and convergence is the "compatibility" between the actor and critic, which is ensured by sharing neural archit
Authors
(none)
Tags
Stats
Related papers
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Neural Proximal/trust Region Policy Optimization Attains Globally Optimal Policy (2019)0.00
- Beyond The Policy Gradient Theorem For Efficient Policy Updates In Actor-critic Algorithms (2022)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- Value Improved Actor Critic Algorithms (2024)0.00