Finite-time Analysis Of Whittle Index Based Q-learning For Restless Multi-armed Bandits With Neural Network Function Approximation
2023 Β· Guojun Xiong, Jian Li
Abstract
Whittle index policy is a heuristic to the intractable restless multi-armed bandits (RMAB) problem. Although it is provably asymptotically optimal, finding Whittle indices remains difficult. In this paper, we present Neural-Q-Whittle, a Whittle index based Q-learning algorithm for RMAB with neural network function approximation, which is an example of nonlinear two-timescale stochastic approximation with Q-function values updated on a faster timescale and Whittle indices on a slower timescale. Despite the empirical success of deep Q-learning, the non-asymptotic convergence rate of Neural-Q-Whittle, which couples neural networks with two-timescale Q-learning largely remains unclear. This paper provides a finite-time analysis of Neural-Q-Whittle, where data are generated from a Markov chain, and Q-function is approximated by a ReLU neural network. Our analysis leverages a Lyapunov drift approach to capture the evolution of two coupled parameters, and the nonlinearity in value function ap
Authors
(none)
Tags
Stats
Related papers
- Q-learning Lagrange Policies For Multi-action Restless Bandits (2021)8.35
- GINO-Q: Learning An Asymptotically Optimal Index Policy For Restless Multi-armed Bandits (2024)0.00
- A Finite-time Analysis Of Q-learning With Neural Network Function Approximation (2019)0.00
- Towards A Pretrained Model For Restless Bandits Via Multi-arm Generalization (2023)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Optimal Policies For Observing Time Series And Related Restless Bandit Problems (2017)0.00
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35