Non-stationary Latent Auto-regressive Bandits
2024 Β· Anna L. Trella, Walter Dempsey, Asim H. Gazi, et al.
Abstract
For the non-stationary multi-armed bandit (MAB) problem, many existing methods allow a general mechanism for the non-stationarity, but rely on a budget for the non-stationarity that is sub-linear to the total number of time steps \(T\). In many real-world settings, however, the mechanism for the non-stationarity can be modeled, but there is no budget for the non-stationarity. We instead consider the non-stationary bandit problem where the reward means change due to a latent, auto-regressive (AR) state. We develop Latent AR LinUCB (LARL), an online linear contextual bandit algorithm that does not rely on the non-stationary budget, but instead forms good predictions of reward means by implicitly predicting the latent state. The key idea is to reduce the problem to a linear dynamical system which can be solved as a linear contextual bandit. In fact, LARL approximates a steady-state Kalman filter and efficiently learns system parameters online. We provide an interpretable regret bound for
Authors
(none)
Tags
Stats
Related papers
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- A Closer Look At The Worst-case Behavior Of Multi-armed Bandit Algorithms (2021)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Stochastic Multi-armed Bandits With Limited Control Variates (2026)0.00
- Q-learning Lagrange Policies For Multi-action Restless Bandits (2021)8.35
- Unlearning Offline Stochastic Multi-armed Bandits (2026)0.00