Towards A Pretrained Model For Restless Bandits Via Multi-arm Generalization
2023 Β· Yunfan Zhao, Nikhil Behari, Edward Hughes, et al.
Abstract
Restless multi-arm bandits (RMABs), a class of resource allocation problems with broad application in areas such as healthcare, online advertising, and anti-poaching, have recently been studied from a multi-agent reinforcement learning perspective. Prior RMAB research suffers from several limitations, e.g., it fails to adequately address continuous states, and requires retraining from scratch when arms opt-in and opt-out over time, a common challenge in many real world applications. We address these limitations by developing a neural network-based pre-trained model (PreFeRMAB) that has general zero-shot ability on a wide range of previously unseen RMABs, and which can be fine-tuned on specific instances in a more sample-efficient way than retraining from scratch. Our model also accommodates general multi-action settings and discrete or continuous state spaces. To enable fast generalization, we learn a novel single policy network model that utilizes feature information and employs a tra
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Q-learning Lagrange Policies For Multi-action Restless Bandits (2021)8.35
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- IRL For Restless Multi-armed Bandits With Applications In Maternal And Child Health (2024)0.00
- GINO-Q: Learning An Asymptotically Optimal Index Policy For Restless Multi-armed Bandits (2024)0.00
- Multi-action Restless Bandits With Weakly Coupled Constraints: Simultaneous Learning And Control (2024)0.00
- Finite-time Analysis Of Whittle Index Based Q-learning For Restless Multi-armed Bandits With Neural Network Function Approximation (2023)0.00
- Fitting Reinforcement Learning Model To Behavioral Data Under Bandits (2025)0.00