Deep Bayesian Bandits Showdown: An Empirical Comparison Of Bayesian Deep Networks For Thompson Sampling
2018 Β· Carlos Riquelme, George Tucker, Jasper Snoek
Abstract
Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains largely unsolved. Thompson Sampling and its extension to reinforcement learning provide an elegant approach to exploration that only requires access to posterior samples of the model. At the same time, advances in approximate Bayesian methods have made posterior approximation for flexible neural network models practical. Thus, it is attractive to consider approximate Bayesian neural networks in a Thompson Sampling framework. To understand the impact of using an approximate posterior on Thompson Sampling, we benchmark well-established and recently developed methods for approximate posterior sampling combined with Thompson Sampling over a series of contextual bandit problems. We found that many approaches that have been successful in the supervised le
Authors
(none)
Tags
Stats
Related papers
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- BOTS: Batch Bayesian Optimization Of Extended Thompson Sampling For Severely Episode-limited RL Settings (2024)0.00
- A Provably Efficient Model-free Posterior Sampling Method For Episodic Reinforcement Learning (2022)0.00
- Approximate Thompson Sampling Via Epistemic Neural Networks (2023)0.00
- Langevin Thompson Sampling With Logarithmic Communication: Bandits And Reinforcement Learning (2023)0.00
- More Efficient Randomized Exploration For Reinforcement Learning Via Approximate Sampling (2024)0.00
- Policy Gradient Optimization Of Thompson Sampling Policies (2020)0.00
- A Bandit Framework For Optimal Selection Of Reinforcement Learning Agents (2019)0.00