Adaptive Sequential Experiments With Unknown Information Arrival Processes
2019 Β· Yonatan Gur, Ahmadreza Momeni
Abstract
Sequential experiments are often characterized by an exploration-exploitation tradeoff that is captured by the multi-armed bandit (MAB) framework. This framework has been studied and applied, typically when at each time period feedback is received only on the action that was selected at that period. However, in many practical settings additional data may become available between decision epochs. We introduce a generalized MAB formulation, which considers a broad class of distributions that are informative about mean rewards, and allows observations from these distributions to arrive according to an arbitrary and a priori unknown arrival process. When it is known how to map auxiliary data to reward estimates, by obtaining matching lower and upper bounds we characterize a spectrum of minimax complexities for this class of problems as a function of the information arrival process, which captures how salient characteristics of this process impact achievable performance. In terms of achievi
Authors
(none)
Tags
Stats
Related papers
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Pure Exploration Under Mediators' Feedback (2023)0.00
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Online Learning With Costly Features In Non-stationary Environments (2023)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Design Experiments To Compare Multi-armed Bandit Algorithms (2026)0.00
- A New Bandit Setting Balancing Information From State Evolution And Corrupted Context (2020)0.00