Harnessing Distribution Ratio Estimators For Learning Agents With Quality And Diversity
2020 Β· Tanmay Gangwani, Jian Peng, Yuan Zhou
Abstract
Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on \(f\)-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.
Authors
(none)
Tags
Stats
Related papers
- Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning (2022)0.00
- Diversity Policy Gradient For Sample Efficient Quality-diversity Optimization (2020)11.58
- Learning In Sparse Rewards Settings Through Quality-diversity Algorithms (2022)0.00
- Synergizing Quality-diversity With Descriptor-conditioned Reinforcement Learning (2023)0.00
- Distributional Reinforcement Learning With Quantile Regression (2017)19.20
- The Quality-diversity Transformer: Generating Behavior-conditioned Trajectories With Decision Transformers (2023)6.77
- Effective Diversity In Population Based Reinforcement Learning (2020)0.00
- Likelihood Quantile Networks For Coordinating Multi-agent Reinforcement Learning (2018)0.00