Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning
2022 Β· Bryon Tjanaka, Matthew C. Fontaine, Julian Togelius, et al.
Abstract
Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs compar
Authors
(none)
Tags
Stats
Related papers
- Harnessing Distribution Ratio Estimators For Learning Agents With Quality And Diversity (2020)0.00
- Diversity Policy Gradient For Sample Efficient Quality-diversity Optimization (2020)11.58
- Mitigating Suboptimality Of Deterministic Policy Gradients In Complex Q-functions (2024)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Learning In Sparse Rewards Settings Through Quality-diversity Algorithms (2022)0.00
- Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-values (2024)0.00
- Synergizing Quality-diversity With Descriptor-conditioned Reinforcement Learning (2023)0.00