Colored Noise In PPO: Improved Exploration And Performance Through Correlated Action Sampling
2023 Β· Jakob Hollenstein, Georg Martius, Justus Piater
Abstract
Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments f
Authors
(none)
Tags
Stats
Related papers
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- The Surprising Effectiveness Of PPO In Cooperative, Multi-agent Games (2021)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Policy Optimization With Model-based Explorations (2018)5.84
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Proximal Policy Optimization With Adaptive Exploration (2024)0.00
- Truly Proximal Policy Optimization (2019)0.00
- PTR-PPO: Proximal Policy Optimization With Prioritized Trajectory Replay (2021)0.00