CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms
2024 Β· Arda Sarp Yenicesu, Furkan B. Mutlu, Suleyman S. Kozat, et al.
Abstract
The utilization of the experience replay mechanism enables agents to effectively leverage their experiences on several occasions. In previous studies, the sampling probability of the transitions was modified based on their relative significance. The process of reassigning sample probabilities for every transition in the replay buffer after each iteration is considered extremely inefficient. Hence, in order to enhance computing efficiency, experience replay prioritization algorithms reassess the importance of a transition as it is sampled. However, the relative importance of the transitions undergoes dynamic adjustments when the agent's policy and value function are iteratively updated. Furthermore, experience replay is a mechanism that retains the transitions generated by the agent's past policies, which could potentially diverge significantly from the agent's most recent policy. An increased deviation from the agent's most recent policy results in a greater frequency of off-policy upd
Authors
(none)
Tags
Stats
Related papers
- Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay (2021)0.00
- Stratified Experience Replay: Correcting Multiplicity Bias In Off-policy Reinforcement Learning (2021)0.00
- Replay For Safety (2021)0.00
- Experience Replay Using Transition Sequences (2017)8.82
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00
- Safe And Robust Experience Sharing For Deterministic Policy Gradient Algorithms (2022)0.00
- Large Batch Experience Replay (2021)0.00
- Frugal Actor-critic: Sample Efficient Off-policy Deep Reinforcement Learning Using Unique Experiences (2024)0.00