DGPO: Discovering Multiple Strategies With Diversity-guided Policy Optimization
2022 Β· Wentse Chen, Shiyu Huang, Yuan Chiang, et al.
Abstract
Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variet
Authors
(none)
Tags
Stats
Related papers
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Continuously Discovering Novel Strategies Via Reward-switching Policy Optimization (2022)0.00
- Diverse Policies Converge In Reward-free Markov Decision Processe (2023)0.00
- Phasic Diversity Optimization For Population-based Reinforcement Learning (2024)0.00
- Diversity Policy Gradient For Sample Efficient Quality-diversity Optimization (2020)11.58
- Discovering Diverse Multi-agent Strategic Behavior Via Reward Randomization (2021)0.00
- Diverse Exploration For Fast And Safe Policy Improvement (2018)4.52
- Effective Diversity In Population Based Reinforcement Learning (2020)0.00