Utilizing Maximum Mean Discrepancy Barycenter For Propagating The Uncertainty Of Value Functions In Reinforcement Learning
2024 Β· Srinjoy Roy, Swagatam Das
Abstract
Accounting for the uncertainty of value functions boosts exploration in Reinforcement Learning (RL). Our work introduces Maximum Mean Discrepancy Q-Learning (MMD-QL) to improve Wasserstein Q-Learning (WQL) for uncertainty propagation during Temporal Difference (TD) updates. MMD-QL uses the MMD barycenter for this purpose, as MMD provides a tighter estimate of closeness between probability measures than the Wasserstein distance. Firstly, we establish that MMD-QL is Probably Approximately Correct in MDP (PAC-MDP) under the average loss metric. Concerning the accumulated rewards, experiments on tabular environments show that MMD-QL outperforms WQL and other algorithms. Secondly, we incorporate deep networks into MMD-QL to create MMD Q-Network (MMD-QN). Making reasonable assumptions, we analyze the convergence rates of MMD-QN using function approximation. Empirical results on challenging Atari games demonstrate that MMD-QN performs well compared to benchmark deep RL algorithms, highlightin
Authors
(none)
Tags
Stats
Related papers
- Uncertainty-aware Low-rank Q-matrix Estimation For Deep Reinforcement Learning (2021)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Deep Reinforcement Learning With Weighted Q-learning (2020)0.00
- Minimax Weight And Q-function Learning For Off-policy Evaluation (2019)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00
- UDQL: Bridging The Gap Between MSE Loss And The Optimal Value Function In Offline Reinforcement Learning (2024)0.00
- Efficient And Robust Reinforcement Learning With Uncertainty-based Value Expansion (2019)0.00