Guarantees For Epsilon-greedy Reinforcement Learning With Function Approximation
2022 Β· Christoph Dann, Yishay Mansour, Mehryar Mohri, et al.
Abstract
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks and yet, they perform well in many others. In fact, in practice, they are often selected as the top choices, due to their simplicity. But, for what tasks do such policies succeed? Can we give theoretical guarantees for their favorable performance? These crucial questions have been scarcely investigated, despite the prominent practical importance of these policies. This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration. Our results apply to value-function-based algorithms in episodic MDPs with bounded Bellman Eluder dimension. We propose a new complexity measure called myopic exploration gap, denoted by alpha, that captures a structural property of the MDP, the exploration policy and the given value function class. We show that the
Authors
(none)
Tags
Stats
Related papers
- Improved Bounds For Reward-agnostic And Reward-free Exploration (2026)0.00
- Convergence Guarantees For Deep Epsilon Greedy Policy Learning (2021)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Understanding The Pathologies Of Approximate Policy Evaluation When Combined With Greedification In Reinforcement Learning (2020)0.00
- Learning Near Optimal Policies With Low Inherent Bellman Error (2020)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Reinforcement Learning With Unbiased Policy Evaluation And Linear Function Approximation (2022)0.00
- The Role Of Lookahead And Approximate Policy Evaluation In Reinforcement Learning With Linear Value Function Approximation (2021)0.00