Curious Explorer: A Provable Exploration Strategy In Policy Learning
2021 Β· Marco Miani, Maurizio Parton, Marco Romito
Abstract
Having access to an exploring restart distribution (the so-called wide coverage assumption) is critical with policy gradient methods. This is due to the fact that, while the objective function is insensitive to updates in unlikely states, the agent may still need improvements in those states in order to reach a nearly optimal payoff. For this reason, wide coverage is used in some form when analyzing theoretical properties of practical policy gradient methods. However, this assumption can be unfeasible in certain environments, for instance when learning is online, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms can have very poor convergence properties and sample efficiency. In this paper, we develop Curious Explorer, a novel and simple iterative state space exploration strategy that can be used with any starting distribution \(\rho\). Curious Explorer starts from \(\rho\), then using intrinsic rewards assigned to the s
Authors
(none)
Tags
Stats
Related papers
- Behind The Myth Of Exploration In Policy Gradients (2024)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Exploring Restart Distributions (2018)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Careful At Estimation And Bold At Exploration (2023)0.00
- Policy Gradient From Demonstration And Curiosity (2020)0.00