Exploiting The Sign Of The Advantage Function To Learn Deterministic Policies In Continuous Domains
2019 Β· Matthieu Zimmer, Paul Weng
Abstract
In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policie
Authors
(none)
Tags
Stats
Related papers
- Distributional Policy Optimization: An Alternative Approach For Continuous Control (2019)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00
- ACE : Off-policy Actor-critic With Causality-aware Entropy Regularization (2024)0.00
- Attraction-repulsion Actor-critic For Continuous Control Reinforcement Learning (2019)0.00
- Improving Exploration In Soft-actor-critic With Normalizing Flows Policies (2019)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Generative Actor-critic: An Off-policy Algorithm Using The Push-forward Model (2021)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00