Optimal Policies For Observing Time Series And Related Restless Bandit Problems
2017 Β· Christopher R. Dance, Tomi Silander
Abstract
The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies. T
Authors
(none)
Tags
Stats
Related papers
- Restless Bandit Problem With Rewards Generated By A Linear Gaussian Dynamical System (2024)0.00
- Quick Best Action Identification In Linear Bandit Problems (2018)0.00
- Beyond Variance Reduction: Understanding The True Impact Of Baselines On Policy Optimization (2020)0.00
- GINO-Q: Learning An Asymptotically Optimal Index Policy For Restless Multi-armed Bandits (2024)0.00
- Finite-time Analysis Of Whittle Index Based Q-learning For Restless Multi-armed Bandits With Neural Network Function Approximation (2023)0.00
- Multi-action Restless Bandits With Weakly Coupled Constraints: Simultaneous Learning And Control (2024)0.00
- Anytime-valid Off-policy Inference For Contextual Bandits (2022)2.26
- Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning (2020)8.60