Abstract

We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has \(\widetilde\{\mathcal\{O\}\}(H^3S^2A/\epsilon^2)\) sample complexity thus improving the \(\epsilon\)-dependence upon existing results, where \(S\) is a number of states, \(A\) is a number of actions, \(H\) is an episode length, and \(\epsilon\) is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order \(\widetilde\{\mathcal\{O\}\}(\mathrm\{poly\}(S,A,H)/\epsilon)\). Interestingly, it is

Authors

(none)

Tags

  • Exploration

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keytiapkin2023fast

Related papers