Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble
2022 Β· Zhengyu Yang, Kan Ren, Xufang Luo, et al.
Abstract
It is challenging for reinforcement learning (RL) algorithms to succeed in real-world applications like financial trading and logistic system due to the noisy observation and environment shifting between training and evaluation. Thus, it requires both high sample efficiency and generalization for resolving real-world tasks. However, directly applying typical RL algorithms can lead to poor performance in such scenarios. Considering the great performance of ensemble methods on both accuracy and generalization in supervised learning (SL), we design a robust and applicable method named Ensemble Proximal Policy Optimization (EPPO), which learns ensemble policies in an end-to-end manner. Notably, EPPO combines each policy and the policy ensemble organically and optimizes both simultaneously. In addition, EPPO adopts a diversity enhancement regularization over the policy space which helps to generalize to unseen states and promotes exploration. We theoretically prove EPPO increases exploratio
Authors
(none)
Tags
Stats
Related papers
- SEERL: Sample Efficient Ensemble Reinforcement Learning (2020)2.26
- Epopt: Learning Robust Neural Network Policies Using Model Ensembles (2016)0.00
- Efficient Reinforcement Learning From Demonstration Using Local Ensemble And Reparameterization With Split And Merge Of Expert Policies (2022)0.00
- Think Outside The Policy: In-context Steered Policy Optimization (2025)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Enhancing Efficiency Of Safe Reinforcement Learning Via Sample Manipulation (2024)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00