POLTER: Policy Trajectory Ensemble Regularization For Unsupervised Reinforcement Learning
2022 · Frederik Schubert, Carolin Benjamins, Sebastian Döhler, et al.
Abstract
The goal of Unsupervised Reinforcement Learning (URL) is to find a reward-agnostic prior policy on a task domain, such that the sample-efficiency on supervised downstream tasks is improved. Although agents initialized with such a prior policy can achieve a significantly higher reward with fewer samples when finetuned on the downstream task, it is still an open question how an optimal pretrained prior policy can be achieved in practice. In this work, we present POLTER (Policy Trajectory Ensemble Regularization) - a general method to regularize the pretraining that can be applied to any URL algorithm and is especially useful on data- and knowledge-based URL algorithms. It utilizes an ensemble of policies that are discovered during pretraining and moves the policy of the URL algorithm closer to its optimal prior. Our method is based on a theoretical framework, and we analyze its practical effects on a white-box benchmark, allowing us to study POLTER with full control. In our main experime
Authors
(none)
Tags
Stats
Related papers
- Entrpo: Trust Region Policy Optimization Method With Entropy Regularization (2021)0.00
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- On The Importance Of Feature Decorrelation For Unsupervised Representation Learning In Reinforcement Learning (2023)0.00
- Regularization Matters In Policy Optimization (2019)2.68
- Reward-conditioned Policies (2019)0.00
- Learning Adaptive Exploration Strategies In Dynamic Environments Through Informed Policy Regularization (2020)0.00