Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay
2021 Β· Dogan C. Cicek, Enes Duran, Baturay Saglam, et al.
Abstract
The experience replay mechanism allows agents to use the experiences multiple times. In prior works, the sampling probability of the transitions was adjusted according to their importance. Reassigning sampling probabilities for every transition in the replay buffer after each iteration is highly inefficient. Therefore, experience replay prioritization algorithms recalculate the significance of a transition when the corresponding transition is sampled to gain computational efficiency. However, the importance level of the transitions changes dynamically as the policy and the value function of the agent are updated. In addition, experience replay stores the transitions are generated by the previous policies of the agent that may significantly deviate from the most recent policy of the agent. Higher deviation from the most recent policy of the agent leads to more off-policy updates, which is detrimental for the agent. In this paper, we develop a novel algorithm, Batch Prioritizing Experien
Authors
(none)
Tags
Stats
Related papers
- CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms (2024)0.00
- Large Batch Experience Replay (2021)0.00
- Regret Minimization Experience Replay In Off-policy Reinforcement Learning (2021)0.00
- Safe And Robust Experience Sharing For Deterministic Policy Gradient Algorithms (2022)0.00
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00
- Stratified Experience Replay: Correcting Multiplicity Bias In Off-policy Reinforcement Learning (2021)0.00
- Adaptive Experience Selection For Policy Gradient (2020)0.00
- MAC-PO: Multi-agent Experience Replay Via Collective Priority Optimization (2023)0.00