Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning
2021 Β· William F. Whitney, Michael Bloesch, Jost Tobias Springenberg, et al.
Abstract
Despite the close connection between exploration and sample efficiency, most state of the art reinforcement learning algorithms include no considerations for exploration beyond maximizing the entropy of the policy. In this work we address this seeming missed opportunity. We observe that the most common formulation of directed exploration in deep RL, known as bonus-based exploration (BBE), suffers from bias and slow coverage in the few-sample regime. This causes BBE to be actively detrimental to policy learning in many control tasks. We show that by decoupling the task policy from the exploration policy, directed exploration can be highly effective for sample-efficient continuous control. Our method, Decoupled Exploration and Exploitation Policies (DEEP), can be combined with any off-policy RL algorithm without modification. When used in conjunction with soft actor-critic, DEEP incurs no performance penalty in densely-rewarding environments. On sparse environments, DEEP gives a several-
Authors
(none)
Tags
Stats
Related papers
- Frugal Actor-critic: Sample Efficient Off-policy Deep Reinforcement Learning Using Unique Experiences (2024)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Diverse Exploration For Fast And Safe Policy Improvement (2018)4.52
- Decoupled Reinforcement Learning To Stabilise Intrinsically-motivated Exploration (2021)2.26
- Efficient Reinforcement Learning Via Decoupling Exploration And Utilization (2023)2.56
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Centralized Cooperative Exploration Policy For Continuous Control Tasks (2023)0.00