A General Class Of Surrogate Functions For Stable And Efficient Reinforcement Learning
2021 Β· Sharan Vaswani, Olivier Bachem, Simone Totaro, et al.
Abstract
Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple bandit problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also su
Authors
(none)
Tags
Stats
Related papers
- Proximal Policy Optimization Algorithms (2017)0.00
- A Novel Framework For Policy Mirror Descent With General Parameterization And Linear Convergence (2023)2.26
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Mirror Learning: A Unifying Framework Of Policy Optimisation (2022)0.00
- On The Global Optimality Of Policy Gradient Methods In General Utility Reinforcement Learning (2024)0.00
- Advantage Shaping As Surrogate Reward Maximization: Unifying Pass@k Policy Gradients (2025)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- A Parametric Class Of Approximate Gradient Updates For Policy Optimization (2022)0.00