Iterative Amortized Policy Optimization
2020 Β· Joseph Marino, Alexandre PichΓ©, Alessandro Davide Ialongo, et al.
Abstract
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control, enabling the estimation and sampling of high-value actions. From the variational inference perspective on RL, policy networks, when used with entropy or KL regularization, are a form of \textit\{amortized optimization\}, optimizing network parameters rather than the policy distributions directly. However, \textit\{direct\} amortized mappings can yield suboptimal policy estimates and restricted distributions, limiting performance and exploration. Given this perspective, we consider the more flexible class of \textit\{iterative\} amortized optimizers. We demonstrate that the resulting technique, iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
Authors
(none)
Tags
Stats
Related papers
- Regularization Matters In Policy Optimization (2019)2.68
- Policy Optimization For Continuous Reinforcement Learning (2023)2.26
- Dual Policy Iteration (2018)0.00
- Policy Optimization In A Noisy Neighborhood: On Return Landscapes In Continuous Control (2023)0.00
- Near-future Policy Optimization (2026)0.00
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00