A Novel Framework For Policy Mirror Descent With General Parameterization And Linear Convergence
2023 Β· Carlo Alfano, Rui Yuan, Patrick Rebeschini
Abstract
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization schemes remains mostly unjustified. In this work, we introduce a novel framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map. Using our framework, we obtain the first result that guarantees linear convergence for a policy-gradient-based method involving general parameterization. To demonstrate the ability of our framework to accommodate general parameterization schemes, we provide its sample complexity when using shallow neural networks, show that it represents an improvement upon the previous be
Authors
(none)
Tags
Stats
Related papers
- Mirror Learning: A Unifying Framework Of Policy Optimisation (2022)0.00
- Neural Proximal/trust Region Policy Optimization Attains Globally Optimal Policy (2019)0.00
- Policy Optimization Over General State And Action Spaces (2022)0.00
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- Learning Mirror Maps In Policy Mirror Descent (2024)0.00
- A Parametric Class Of Approximate Gradient Updates For Policy Optimization (2022)0.00
- A General Class Of Surrogate Functions For Stable And Efficient Reinforcement Learning (2021)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00