Bayesian Distributional Policy Gradients
2021 Β· Luchen Li, A. Aldo Faisal
Abstract
Distributional Reinforcement Learning (RL) maintains the entire probability distribution of the reward-to-go, i.e. the return, providing more learning signals that account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous works in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bellman operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model return distributions. The proposed algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to estimate a variational posterior from the returns. Moreover, we can now interpret the return pre
Authors
(none)
Tags
Stats
Related papers
- Pg-rainbow: Using Distributional Reinforcement Learning In Policy Gradient Methods (2024)0.00
- Nonlinear Distributional Gradient Temporal-difference Learning (2018)0.00
- Bayesian Policy Gradients Via Alpha Divergence Dropout Inference (2017)0.00
- Distributional Reinforcement Learning For Multi-dimensional Reward Functions (2021)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- Rethinking Adversarial Attacks In Reinforcement Learning From Policy Distribution Perspective (2025)5.84
- Normality-guided Distributional Reinforcement Learning For Continuous Control (2022)0.00
- Continuous Control Reinforcement Learning: Distributed Distributional Drq Algorithms (2024)0.00