PTR-PPO: Proximal Policy Optimization With Prioritized Trajectory Replay
2021 Β· Xingxing Liang, Yang Ma, Yanghe Feng, et al.
Abstract
On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO) that combines on-policy and off-policy methods to improve sampling efficiency by prioritizing the replay of trajectories generated by old policies. We first design three trajectory priorities based on the characteristics of trajectories: the first two being max and mean trajectory priorities based on one-step empirical generalized advantage estimation (GAE) values and the last being reward trajectory priorities based on normalized undiscounted cumulative reward. Then, we incorporate the prioritized trajectory replay into the PPO algorithm, propose a truncated importance weight method to overcome the high variance caused by large importance weights under multistep experience, and design a policy improvement loss function for PPO under off-policy condi
Authors
(none)
Tags
Stats
Related papers
- Proximal Policy Optimization Algorithms (2017)0.00
- Truly Proximal Policy Optimization (2019)0.00
- TIC-GRPO: Provable And Efficient Optimization For Reinforcement Learning From Human Feedback (2025)0.00
- Simple Policy Optimization (2024)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- Policy Optimization With Model-based Explorations (2018)5.84
- Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (2018)0.00
- Reparameterization Proximal Policy Optimization (2025)0.00