Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback
2024 Β· Guojun Xiong, Jian Li
Abstract
Restless multi-armed bandits (RMAB) play a central role in modeling sequential decision making problems under an instantaneous activation constraint that at most B arms can be activated at any decision epoch. Each restless arm is endowed with a state that evolves independently according to a Markov decision process regardless of being activated or not. In this paper, we consider the task of learning in episodic RMAB with unknown transition functions and adversarial rewards, which can change arbitrarily across episodes. Further, we consider a challenging but natural bandit feedback setting that only adversarial rewards of activated arms are revealed to the decision maker (DM). The goal of the DM is to maximize its total adversarial rewards during the learning process while the instantaneous activation constraint must be satisfied in each decision epoch. We develop a novel reinforcement learning algorithm with two key contributors: a novel biased adversarial reward estimator to deal with
Authors
(none)
Tags
Stats
Related papers
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Towards A Pretrained Model For Restless Bandits Via Multi-arm Generalization (2023)0.00
- Q-learning Lagrange Policies For Multi-action Restless Bandits (2021)8.35
- Multi-action Restless Bandits With Weakly Coupled Constraints: Simultaneous Learning And Control (2024)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- Non-stationary Latent Auto-regressive Bandits (2024)0.00
- IRL For Restless Multi-armed Bandits With Applications In Maternal And Child Health (2024)0.00