Principal-agent Bandit Games With Self-interested And Exploratory Learning Agents
2024 Β· Junyan Liu, Lillian J. Ratliff
Abstract
We study the repeated principal-agent bandit game, where the principal indirectly interacts with the unknown environment by proposing incentives for the agent to play arms. Most existing work assumes the agent has full knowledge of the reward means and always behaves greedily, but in many online marketplaces, the agent needs to learn the unknown environment and sometimes explore. Motivated by such settings, we model a self-interested learning agent with exploration behaviors who iteratively updates reward estimates and either selects an arm that maximizes the estimated reward plus incentive or explores arbitrarily with a certain probability. As a warm-up, we first consider a self-interested learning agent without exploration. We propose algorithms for both i.i.d. and linear reward settings with bandit feedback in a finite horizon \(T\), achieving regret bounds of \(\widetilde\{O\}(\sqrt\{T\})\) and \(\widetilde\{O\}( T^\{2/3\} )\), respectively. Specifically, these algorithms are estab
Authors
(none)
Tags
Stats
Related papers
- Bandit Social Learning With Exploration Episodes (2026)0.00
- Bandit Social Learning: Exploration Under Myopic Behavior (2023)0.00
- Near-optimal Collaborative Learning In Bandits (2022)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Exploration And Incentives In Reinforcement Learning (2021)8.09
- No-regret Learning In Unknown Games With Correlated Payoffs (2019)0.00
- Learning A Game By Paying The Agents (2025)0.00
- Stochastic Principal-agent Problems: Efficient Computation And Learning (2023)0.00