Never Give Up: Learning Directed Exploration Strategies
2020 · Adrià Puigdomènech Badia, Pablo Sprechmann, Alex Vitvitskyi, et al.
Abstract
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies. We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies, thereby encouraging the agent to repeatedly revisit all states in its environment. A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control. We employ the framework of Universal Value Function Approximators (UVFA) to simultaneously learn many directed exploration policies with the same neural network, with different trade-offs between exploration and exploitation. By using the same neural network for different degrees of exploration/exploitation, transfer is demonstrated from predominantly exploratory policies yielding effective exploitative policies. The proposed method can be incorporated to
Authors
(none)
Tags
Stats
Related papers
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- Generative Adversarial Exploration For Reinforcement Learning (2022)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- Deep Intrinsically Motivated Exploration In Continuous Control (2022)0.00
- MULEX: Disentangling Exploitation From Exploration In Deep RL (2019)0.00
- Fast Active Learning For Pure Exploration In Reinforcement Learning (2020)0.00
- An Intrinsically-motivated Approach For Learning Highly Exploring And Fast Mixing Policies (2019)6.34
- Unsupervised Learning Of Efficient Exploration: Pre-training Adaptive Policies Via Self-imposed Goals (2026)0.00