An Exploration-free Method For A Linear Stochastic Bandit Driven By A Linear Gaussian Dynamical System
2025 Β· Jonathan Gornet, Yilin Mo, Bruno Sinopoli
Abstract
In stochastic multi-armed bandits, a major problem the learner faces is the trade-off between exploration and exploitation. Recently, exploration-free methods -- methods that commit to the action predicted to return the highest reward -- have been studied from the perspective of linear bandits. In this paper, we introduce a linear bandit setting where the reward is the output of a linear Gaussian dynamical system. Motivated by a problem encountered in hyperparameter optimization for reinforcement learning, where the number of actions is much higher than the number of training iterations, we propose Kalman filter Observability Dependent Exploration (KODE), an exploration-free method that utilizes the Kalman filter predictions to select actions. Our major contribution of this work is our analysis of the performance of the proposed method, which is dependent on the observability properties of the underlying linear Gaussian dynamical system. We evaluate KODE via two different metrics: regr
Authors
(none)
Tags
Stats
Related papers
- Restless Bandit Problem With Rewards Generated By A Linear Gaussian Dynamical System (2024)0.00
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Optimistic Active Exploration Of Dynamical Systems (2023)0.00
- Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents (2024)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Exploration Versus Exploitation In Reinforcement Learning: A Stochastic Control Approach (2018)9.76