Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization
2022 Β· Hua Zheng, Wei Xie
Abstract
Built on our previous study on green simulation assisted policy gradient (GS-PG) focusing on trajectory-based reuse, in this paper, we consider infinite-horizon Markov Decision Processes and create a new importance sampling based policy gradient optimization approach to support dynamic decision making. The existing GS-PG method was designed to learn from complete episodes or process trajectories, which limits its applicability to low-data situations and flexible online process control. To overcome this limitation, the proposed approach can selectively reuse the most related partial trajectories, i.e., the reuse unit is based on per-step or per-decision historical observations. In specific, we create a mixture likelihood ratio (MLR) based policy gradient optimization that can leverage the information from historical state-action transitions generated under different behavioral policies. The proposed variance reduction experience replay (VRER) approach can intelligently select and reuse
Authors
(none)
Tags
Stats
Related papers
- Reusing Trajectories In Policy Gradients Enables Fast Convergence (2025)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Trajectory-wise Control Variates For Variance Reduction In Policy Gradient Methods (2019)0.00
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- Reusing Historical Trajectories In Natural Policy Gradient Via Importance Sampling: Convergence And Convergence Rate (2024)2.26