Improving Exploration In Evolution Strategies For Deep Reinforcement Learning Via A Population Of Novelty-seeking Agents
2017 Β· Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, et al.
Abstract
Evolution strategies (ES) are a family of black-box optimization algorithms able to train deep neural networks roughly as well as Q-learning and policy gradient methods on challenging deep reinforcement learning (RL) problems, but are much faster (e.g. hours vs. days) because they parallelize better. However, many RL problems require directed exploration because they have reward functions that are sparse or deceptive (i.e. contain local optima), and it is unknown how to encourage such exploration with ES. Here we show that algorithms that have been invented to promote directed exploration in small-scale evolved neural networks via populations of exploring agents, specifically novelty search (NS) and quality diversity (QD) algorithms, can be hybridized with ES to improve its performance on sparse or deceptive deep RL tasks, while retaining scalability. Our experiments confirm that the resultant new algorithms, NS-ES and two QD algorithms, NSR-ES and NSRA-ES, avoid local optima encounter
Authors
(none)
Tags
Stats
Related papers
- PNS: Population-guided Novelty Search For Reinforcement Learning In Hard Exploration Environments (2018)7.16
- Adaptive Combination Of A Genetic Algorithm And Novelty Search For Deep Neuroevolution (2022)0.00
- Accelerating Reinforcement Learning With A Directional-gaussian-smoothing Evolution Strategy (2020)6.77
- An Efficient Asynchronous Method For Integrating Evolutionary And Gradient-based Policy Search (2020)0.00
- Supplementing Gradient-based Reinforcement Learning With Simple Evolutionary Ideas (2023)0.00
- Deep Reinforcement Learning Versus Evolution Strategies: A Comparative Survey (2021)0.00
- Novelty Search For Deep Reinforcement Learning Policy Network Weights By Action Sequence Edit Metric Distance (2019)8.09
- Learning In Sparse Rewards Settings Through Quality-diversity Algorithms (2022)0.00