Reusing Historical Trajectories In Natural Policy Gradient Via Importance Sampling: Convergence And Convergence Rate
2024 Β· Yifan Lin, Yuhao Wang, Enlu Zhou
Abstract
Reinforcement learning provides a mathematical framework for learning-based control, whose success largely depends on the amount of data it can utilize. The efficient utilization of historical trajectories obtained from previous policies is essential for expediting policy optimization. Empirical evidence has shown that policy gradient methods based on importance sampling work well. However, existing literature often neglect the interdependence between trajectories from different iterations, and the good empirical performance lacks a rigorous theoretical justification. In this paper, we study a variant of the natural policy gradient method with reusing historical trajectories via importance sampling. We show that the bias of the proposed estimator of the gradient is asymptotically negligible, the resultant algorithm is convergent, and reusing past trajectories helps improve the convergence rate. We further apply the proposed estimator to popular policy optimization algorithms such as tr
Authors
(none)
Tags
Stats
Related papers
- Reusing Trajectories In Policy Gradients Enables Fast Convergence (2025)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Natural Policy Gradients In Reinforcement Learning Explained (2022)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00
- Improved Sample Complexity Analysis Of Natural Policy Gradient Algorithm With General Parameterization For Infinite Horizon Discounted Reward Markov Decision Processes (2023)0.00
- On The Convergence Of Discounted Policy Gradient Methods (2022)0.00