Deep Reinforcement Learning With Weighted Q-learning
2020 Β· Andrea Cini, Carlo D'Eramo, Jan Peters, et al.
Abstract
Reinforcement learning algorithms based on Q-learning are driving Deep Reinforcement Learning (DRL) research towards solving complex problems and achieving super-human performance on many of them. Nevertheless, Q-Learning is known to be positively biased since it learns by using the maximum over noisy estimates of expected values. Systematic overestimation of the action values coupled with the inherently high variance of DRL methods can lead to incrementally accumulate errors, causing learning algorithms to diverge. Ideally, we would like DRL agents to take into account their own uncertainty about the optimality of each action, and be able to exploit it to make more informed estimations of the expected return. In this regard, Weighted Q-Learning (WQL) effectively reduces bias and shows remarkable results in stochastic environments. WQL uses a weighted sum of the estimated action values, where the weights correspond to the probability of each action value being the maximum; however, the
Authors
(none)
Tags
Stats
Related papers
- Weighted Double Deep Multiagent Reinforcement Learning In Stochastic Cooperative Environments (2018)0.00
- Utilizing Maximum Mean Discrepancy Barycenter For Propagating The Uncertainty Of Value Functions In Reinforcement Learning (2024)0.00
- An Adaptive Synchronization Approach For Weights Of Deep Reinforcement Learning (2020)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Minimax Weight And Q-function Learning For Off-policy Evaluation (2019)0.00
- Zero-sum Positional Differential Games As A Framework For Robust Reinforcement Learning: Deep Q-learning Approach (2024)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Loss- And Reward-weighting For Efficient Distributed Reinforcement Learning (2023)0.00