Policy Optimization With Model-based Explorations
2018 Β· Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, et al.
Abstract
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning. In this paper, we present a new technique to address the trade-off between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Model-based Explorations (POME). POME uses two components to predict the actions' target values: a
Authors
(none)
Tags
Stats
Related papers
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- Proximal Policy Optimization With Adaptive Exploration (2024)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Truly Proximal Policy Optimization (2019)0.00
- PPO-CMA: Proximal Policy Optimization With Covariance Matrix Adaptation (2018)0.00
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- Simple Policy Optimization (2024)0.00