Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling
2017 · Iñigo Urteaga, Chris H. Wiggins
Abstract
Reinforcement learning studies how to balance exploration and exploitation in real-world systems, optimizing interactions with the world while simultaneously learning how the world operates. One general class of algorithms for such learning is the multi-armed bandit setting. Randomized probability matching, based upon the Thompson sampling approach introduced in the 1930s, has recently been shown to perform well and to enjoy provable optimality properties. It permits generative, interpretable modeling in a Bayesian setting, where prior knowledge is incorporated, and the computed posteriors naturally capture the full state of knowledge. In this work, we harness the information contained in the Bayesian posterior and estimate its sufficient statistics via sampling. In several application domains, for example in health and medicine, each interaction with the world can be expensive and invasive, whereas drawing samples from the model is relatively inexpensive. Exploiting this viewpoint, we
Authors
(none)
Tags
Stats
Related papers
- Deep Bayesian Bandits Showdown: An Empirical Comparison Of Bayesian Deep Networks For Thompson Sampling (2018)0.00
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- BOTS: Batch Bayesian Optimization Of Extended Thompson Sampling For Severely Episode-limited RL Settings (2024)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents (2024)0.00
- Langevin Thompson Sampling With Logarithmic Communication: Bandits And Reinforcement Learning (2023)0.00
- Design Experiments To Compare Multi-armed Bandit Algorithms (2026)0.00