Revisiting Gaussian Mixture Critics In Off-policy Reinforcement Learning: A Sample-based Approach
2022 Β· Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, et al.
Abstract
Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value estimation.One major drawback of the C51 approach is its requirement of prior knowledge about the minimum andmaximum values a policy can attain as well as the number of bins used, which fixes the resolution ofthe distributional estimate. While the DeepMind control suite of tasks utilizes standardized rewards and episode lengths, thus enabling the entire suite to be solved with a single setting of these hyperparameters, this is often not the case. This paper revisits a natural alternative that removes this requirement, namelya mixture of Gaussians, and a simple sample-based loss function to train
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- Generative Actor-critic: An Off-policy Algorithm Using The Push-forward Model (2021)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- Neural Network Compatible Off-policy Natural Actor-critic Algorithm (2021)0.00
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00