Minimax-optimal Reward-agnostic Exploration In Reinforcement Learning
2023 Β· Gen Li, Yuling Yan, Yuxin Chen, et al.
Abstract
This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon inhomogeneous Markov decision process with \(S\) states, \(A\) actions, and horizon length \(H\), and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin\{align*\} \frac\{SAH^3\}\{\epsilon^2\} \text\{ sample episodes (up to log factor)\} \end\{align*\} without guidance of the reward information, our algorithm is able to find \(\epsilon\)-optimal policies for all these reward functions, provided that \(\epsilon\) is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds \(\frac\{S^2AH^3\}\{\epsilon^2\}\) episode
Authors
(none)
Tags
Stats
Related papers
- Improved Bounds For Reward-agnostic And Reward-free Exploration (2026)0.00
- Gap-dependent Unsupervised Exploration For Reinforcement Learning (2021)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Provably Efficient Exploration For Reinforcement Learning Using Unsupervised Learning (2020)0.00
- Conservative Exploration In Reinforcement Learning (2020)0.00
- Fast Rates For Maximum Entropy Exploration (2023)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00