Easy Monotonic Policy Iteration
2016 Β· Joshua Achiam
Abstract
A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or \(Q\)-function may fail to improve performance---or worse, actually cause the policy performance to degrade. Prior work has addressed this for policy iteration by deriving tight policy improvement bounds; by optimizing the lower bound on policy improvement, a better policy is guaranteed. However, existing approaches suffer from bounds that are hard to optimize in practice because they include sup norm terms which cannot be efficiently estimated or differentiated. In this work, we derive a better policy improvement bound where the sup norm of the policy divergence has been replaced with an average divergence; this leads to an algorithm, Easy Monotonic Policy Iteration, that generates sequences of policies with guaranteed non-decreasing returns and is easy to impleme
Authors
(none)
Tags
Stats
Related papers
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Dual Policy Iteration (2018)0.00
- Iterative Amortized Policy Optimization (2020)0.00
- Variance-reduced Conservative Policy Iteration (2022)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- A New Policy Iteration Algorithm For Reinforcement Learning In Zero-sum Markov Games (2023)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Some Remarks On Gradient Dominance And LQR Policy Optimization (2025)0.00