Restless Bandit Problem With Rewards Generated By A Linear Gaussian Dynamical System
2024 Β· Jonathan Gornet, Bruno Sinopoli
Abstract
Decision-making under uncertainty is a fundamental problem encountered frequently and can be formulated as a stochastic multi-armed bandit problem. In the problem, the learner interacts with an environment by choosing an action at each round, where a round is an instance of an interaction. In response, the environment reveals a reward, which is sampled from a stochastic process, to the learner. The goal of the learner is to maximize cumulative reward. In this work, we assume that the rewards are the inner product of an action vector and a state vector generated by a linear Gaussian dynamical system. To predict the reward for each action, we propose a method that takes a linear combination of previously observed rewards for predicting each action's next reward. We show that, regardless of the sequence of previous actions chosen, the reward sampled for any previously chosen action can be used for predicting another action's future reward, i.e. the reward sampled for action 1 at round \(t
Authors
(none)
Tags
Stats
Related papers
- An Exploration-free Method For A Linear Stochastic Bandit Driven By A Linear Gaussian Dynamical System (2025)0.00
- Learning In Restless Bandits Under Exogenous Global Markov Process (2021)6.34
- Optimal Policies For Observing Time Series And Related Restless Bandit Problems (2017)0.00
- Quick Best Action Identification In Linear Bandit Problems (2018)0.00
- Multi-action Restless Bandits With Weakly Coupled Constraints: Simultaneous Learning And Control (2024)0.00
- A New Bandit Setting Balancing Information From State Evolution And Corrupted Context (2020)0.00
- Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents (2024)0.00
- Balancing Act: Prioritization Strategies For Llm-designed Restless Bandit Rewards (2024)0.00