An Analytical Update Rule For General Policy Optimization
2021 Β· Hepeng Li, Nicholas Clavette, Haibo He
Abstract
We present an analytical policy update rule that is independent of parametric function approximators. The policy update rule is suitable for optimizing general stochastic policies and has a monotonic improvement guarantee. It is derived from a closed-form solution to trust-region optimization using calculus of variation, following a new theoretical result that tightens existing bounds for policy improvement using trust-region methods. The update rule builds a connection between policy search methods and value function methods. Moreover, off-policy reinforcement learning algorithms can be derived from the update rule since it does not need to compute integration over on-policy states. In addition, the update rule extends immediately to cooperative multi-agent systems when policy updates are performed by one agent at a time.
Authors
(none)
Tags
Stats
Related papers
- A Parametric Class Of Approximate Gradient Updates For Policy Optimization (2022)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Order Matters: Agent-by-agent Policy Optimization (2023)0.00
- Simple Policy Optimization (2024)0.00
- Absolute Policy Optimization (2023)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Uncertainty-aware Policy Optimization: A Robust, Adaptive Trust Region Approach (2020)0.00
- Adaptive Trust Region Policy Optimization: Global Convergence And Faster Rates For Regularized Mdps (2019)12.10