Adaptive Experience Selection For Policy Gradient
2020 Β· Saad Mohamad, Giovanni Montana
Abstract
Policy gradient reinforcement learning (RL) algorithms have achieved impressive performance in challenging learning tasks such as continuous control, but suffer from high sample complexity. Experience replay is a commonly used approach to improve sample efficiency, but gradient estimators using past trajectories typically have high variance. Existing sampling strategies for experience replay like uniform sampling or prioritised experience replay do not explicitly try to control the variance of the gradient estimates. In this paper, we propose an online learning algorithm, adaptive experience selection (AES), to adaptively learn an experience sampling distribution that explicitly minimises this variance. Using a regret minimisation approach, AES iteratively updates the experience sampling distribution to match the performance of a competitor distribution assumed to have optimal variance. Sample non-stationarity is addressed by proposing a dynamic (i.e. time changing) competitor distribu
Authors
(none)
Tags
Stats
Related papers
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Large Batch Experience Replay (2021)0.00
- Safe And Robust Experience Sharing For Deterministic Policy Gradient Algorithms (2022)0.00
- Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay (2021)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Replay For Safety (2021)0.00
- An Efficient Asynchronous Method For Integrating Evolutionary And Gradient-based Policy Search (2020)0.00