Reusing Trajectories In Policy Gradients Enables Fast Convergence
2025 Β· Alessandro Montenegro, Federico Mansutti, Marco Mussi, et al.
Abstract
Policy gradient (PG) methods are a class of effective reinforcement learning algorithms, particularly when dealing with continuous control problems. They rely on fresh on-policy data, making them sample-inefficient and requiring \(O(\epsilon^\{-2\})\) trajectories to reach an \(\epsilon\)-approximate stationary point. A common strategy to improve efficiency is to reuse information from past iterations, such as previous gradients or trajectories, leading to off-policy PG methods. While gradient reuse has received substantial attention, leading to improved rates up to \(O(\epsilon^\{-3/2\})\), the reuse of past trajectories, although intuitive, remains largely unexplored from a theoretical perspective. In this work, we provide the first rigorous theoretical evidence that reusing past off-policy trajectories can significantly accelerate PG convergence. We propose RT-PG (Reusing Trajectories - Policy Gradient), a novel algorithm that leverages a power mean-corrected multiple importance wei
Authors
(none)
Tags
Stats
Related papers
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Reusing Historical Trajectories In Natural Policy Gradient Via Importance Sampling: Convergence And Convergence Rate (2024)2.26
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- PTR-PPO: Proximal Policy Optimization With Prioritized Trajectory Replay (2021)0.00
- Stochastic Policy Gradient Methods: Improved Sample Complexity For Fisher-non-degenerate Policies (2023)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model (2021)0.00