Fast Rates For Maximum Entropy Exploration
2023 Β· Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, et al.
Abstract
We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al.(2019) in the discounted setting. For this type of exploration, we propose a game-theoretic algorithm that has \(\widetilde\{\mathcal\{O\}\}(H^3S^2A/\epsilon^2)\) sample complexity thus improving the \(\epsilon\)-dependence upon existing results, where \(S\) is a number of states, \(A\) is a number of actions, \(H\) is an episode length, and \(\epsilon\) is a desired accuracy. The second type of entropy we study is the trajectory entropy. This objective function is closely related to the entropy-regularized MDPs, and we propose a simple algorithm that has a sample complexity of order \(\widetilde\{\mathcal\{O\}\}(\mathrm\{poly\}(S,A,H)/\epsilon)\). Interestingly, it is
Authors
(none)
Tags
Stats
Related papers
- Maximum Entropy Exploration Without The Rollouts (2026)0.00
- Maximum-entropy Exploration With Future State-action Visitation Measures (2026)0.00
- The Importance Of Non-markovianity In Maximum State Entropy Exploration (2022)0.00
- Provably Efficient Maximum Entropy Exploration (2018)0.00
- K-means Maximum Entropy Exploration (2022)0.00
- Off-policy Maximum Entropy RL With Future State And Action Visitation Measures (2024)0.00
- Accelerating Reinforcement Learning With Value-conditional State Entropy Exploration (2023)0.00
- Task-agnostic Exploration Via Policy Gradient Of A Non-parametric State Entropy Estimate (2020)0.00