Assumed Density Filtering Q-learning
2017 Β· Heejin Jeong, Clark Zhang, George J. Pappas, et al.
Abstract
While off-policy temporal difference (TD) methods have widely been used in reinforcement learning due to their efficiency and simple implementation, their Bayesian counterparts have not been utilized as frequently. One reason is that the non-linear max operation in the Bellman optimality equation makes it difficult to define conjugate distributions over the value functions. In this paper, we introduce a novel Bayesian approach to off-policy TD methods, called as ADFQ, which updates beliefs on state-action values, Q, through an online Bayesian inference method known as Assumed Density Filtering. We formulate an efficient closed-form solution for the value update by approximately estimating analytic parameters of the posterior of the Q-beliefs. Uncertainty measures in the beliefs not only are used in exploration but also provide a natural regularization for the value update considering all next available actions. ADFQ converges to Q-learning as the uncertainty measures of the Q-beliefs d
Authors
(none)
Tags
Stats
Related papers
- Simplifying Deep Temporal Difference Learning (2024)0.00
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- Temporal-difference Value Estimation Via Uncertainty-guided Soft Updates (2021)0.00
- Q-distribution Guided Q-learning For Offline Reinforcement Learning: Uncertainty Penalized Q-value Via Consistency Model (2024)0.00
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Neural Temporal-difference And Q-learning Provably Converge To Global Optima (2019)7.81
- Time-scale Separation In Q-learning: Extending Td(\(\triangle\)) For Action-value Function Decomposition (2024)0.00