Pure Exploration Under Mediators' Feedback
2023 Β· Riccardo Poiani, Alberto Maria Metelli, Marcello Restelli
Abstract
Stochastic multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a stochastic reward. Within the context of best-arm identification (BAI) problems, the goal of the agent lies in finding the optimal arm, i.e., the one with highest expected reward, as accurately and efficiently as possible. Nevertheless, the sequential interaction protocol of classical BAI problems, where the agent has complete control over the arm being pulled at each round, does not effectively model several decision-making problems of interest (e.g., off-policy learning, partially controllable environments, and human feedback). For this reason, in this work, we propose a novel strict generalization of the classical BAI problem that we refer to as best-arm identification under mediators' feedback (BAI-MF). More specifically, we consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Online Learning For Cooperative Multi-player Multi-armed Bandits (2021)5.24
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Flickering Multi-armed Bandits (2026)0.00
- A New Bandit Setting Balancing Information From State Evolution And Corrupted Context (2020)0.00