Relu To The Rescue: Improve Your On-policy Actor-critic With Positive Advantages
2023 Β· Andrew Jesson, Chris Lu, Gunshi Gupta, et al.
Abstract
This paper proposes a step toward approximate Bayesian inference in on-policy actor-critic deep reinforcement learning. It is implemented through three changes to the Asynchronous Advantage Actor-Critic (A3C) algorithm: (1) applying a ReLU function to advantage estimates, (2) spectral normalization of actor-critic weights, and (3) incorporating *dropout as a Bayesian approximation*. We prove under standard assumptions that restricting policy updates to positive advantages optimizes for value by maximizing a lower bound on the value function plus an additive term. We show that the additive term is bounded proportional to the Lipschitz constant of the value function, which offers theoretical grounding for spectral normalization of critic weights. Finally, our application of dropout corresponds to approximate Bayesian inference over both the actor and critic parameters, which enables \textit\{adaptive state-aware\} exploration around the modes of the actor via Thompson sampling. We demons
Authors
(none)
Tags
Stats
Related papers
- Decision-aware Actor-critic With Function Approximation And Theoretical Guarantees (2023)0.00
- Ader:adapting Between Exploration And Robustness For Actor-critic Methods (2021)0.00
- Provable Benefits Of Actor-critic Methods For Offline Reinforcement Learning (2021)0.00
- Improving Actor-critic Training With Steerable Action-value Approximation Errors (2024)0.00
- Stochastic Actor-critic: Mitigating Overestimation Via Temporal Aleatoric Uncertainty (2026)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Recursive Least Squares Advantage Actor-critic Algorithms (2022)0.00
- Importance Weighted Actor-critic For Optimal Conservative Offline Reinforcement Learning (2023)0.00