Diversity Policy Gradient For Sample Efficient Quality-diversity Optimization
2020 · Thomas Pierrot, Valentin MacÉ, Félix Chalumeau, et al.
Abstract
A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in continuous control environments. The main contribution of this work is the introduction of a Diversity P
Authors
(none)
Tags
Stats
Related papers
- Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning (2022)0.00
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- DGPO: Discovering Multiple Strategies With Diversity-guided Policy Optimization (2022)2.26
- Harnessing Distribution Ratio Estimators For Learning Agents With Quality And Diversity (2020)0.00
- Synergizing Quality-diversity With Descriptor-conditioned Reinforcement Learning (2023)0.00
- Phasic Diversity Optimization For Population-based Reinforcement Learning (2024)0.00
- The Quality-diversity Transformer: Generating Behavior-conditioned Trajectories With Decision Transformers (2023)6.77
- Policy Optimization By Genetic Distillation (2017)0.00