Greedy Actor-critic: A New Conditional Cross-entropy Method For Policy Improvement
2018 Β· Samuel Neumann, Sungsu Lim, Ajin Joseph, et al.
Abstract
Many policy gradient methods are variants of Actor-Critic (AC), where a value function (critic) is learned to facilitate updating the parameterized policy (actor). The update to the actor involves a log-likelihood update weighted by the action-values, with the addition of entropy regularization for soft variants. In this work, we explore an alternative update for the actor, based on an extension of the cross entropy method (CEM) to condition on inputs (states). The idea is to start with a broader policy and slowly concentrate around maximal actions, using a maximum likelihood update towards actions in the top percentile per state. The speed of this concentration is controlled by a proposal policy, that concentrates at a slower rate than the actor. We first provide a policy improvement result in an idealized setting, and then prove that our conditional CEM (CCEM) strategy tracks a CEM update per state, even with changing action-values. We empirically show that our Greedy AC algorithm, t
Authors
(none)
Tags
Stats
Related papers
- Beyond The Policy Gradient Theorem For Efficient Policy Updates In Actor-critic Algorithms (2022)0.00
- Off-policy Actor-critic In An Ensemble: Achieving Maximum General Entropy And Effective Environment Exploration In Deep Reinforcement Learning (2019)0.00
- Value Improved Actor Critic Algorithms (2024)0.00
- Guide Actor-critic For Continuous Control (2017)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization (2024)0.00
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00
- S\(^2\)AC: Energy-based Reinforcement Learning With Stein Soft Actor Critic (2024)2.41