Multi-path Policy Optimization
2019 Β· Ling Pan, Qingpeng Cai, Longbo Huang
Abstract
Recent years have witnessed a tremendous improvement of deep reinforcement learning. However, a challenging problem is that an agent may suffer from inefficient exploration, particularly for on-policy methods. Previous exploration methods either rely on complex structure to estimate the novelty of states, or incur sensitive hyper-parameters causing instability. We propose an efficient exploration method, Multi-Path Policy Optimization (MPPO), which does not incur high computation cost and ensures stability. MPPO maintains an efficient mechanism that effectively utilizes a population of diverse policies to enable better exploration, especially in sparse environments. We also give a theoretical guarantee of the stable performance. We build our scheme upon two widely-adopted on-policy methods, the Trust-Region Policy Optimization algorithm and Proximal Policy Optimization algorithm. We conduct extensive experiments on several MuJoCo tasks and their sparsified variants to fairly evaluate t
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Guided Policy Optimization (2025)0.00
- FP3O: Enabling Proximal Policy Optimization In Multi-agent Cooperation With Parameter-sharing Versatility (2023)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Simple Policy Optimization (2024)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Policy Optimization With Model-based Explorations (2018)5.84
- Cautiously Optimistic Policy Optimization And Exploration With Linear Function Approximation (2021)0.00
- ANO: A Principled Approach To Robust Policy Optimization (2026)0.00