Computing The Performance Of A New Adaptive Sampling Algorithm Based On The Gittins Index In Experiments With Exponential Rewards
2023 Β· James K. He, SofΓa S. Villar, Lida Mavrogonatou
Abstract
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the
Authors
(none)
Tags
Stats
Related papers
- Design Experiments To Compare Multi-armed Bandit Algorithms (2026)0.00
- GINO-Q: Learning An Asymptotically Optimal Index Policy For Restless Multi-armed Bandits (2024)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Adaptive Sequential Experiments With Unknown Information Arrival Processes (2019)0.00
- Adapting Behaviour For Learning Progress (2019)0.00
- Statistically Efficient Bayesian Sequential Experiment Design Via Reinforcement Learning With Cross-entropy Estimators (2023)0.00
- Provably Efficient Information-directed Sampling Algorithms For Multi-agent Reinforcement Learning (2024)2.26
- Trading Off Rewards And Errors In Multi-armed Bandits (2026)0.00