Stratified Experience Replay: Correcting Multiplicity Bias In Off-policy Reinforcement Learning
2021 Β· Brett Daley, Cameron Hickert, Christopher Amato
Abstract
Deep Reinforcement Learning (RL) methods rely on experience replay to approximate the minibatched supervised learning setting; however, unlike supervised learning where access to lots of training data is crucial to generalization, replay-based deep RL appears to struggle in the presence of extraneous data. Recent works have shown that the performance of Deep Q-Network (DQN) degrades when its replay memory becomes too large. This suggests that outdated experiences somehow impact the performance of deep RL, which should not be the case for off-policy methods like DQN. Consequently, we re-examine the motivation for sampling uniformly over a replay memory, and find that it may be flawed when using function approximation. We show that -- despite conventional wisdom -- sampling from the uniform distribution does not yield uncorrelated training samples and therefore biases gradients during training. Our theory prescribes a special non-uniform distribution to cancel this effect, and we propo
Authors
(none)
Tags
Stats
Related papers
- CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms (2024)0.00
- Stabilising Experience Replay For Deep Multi-agent Reinforcement Learning (2017)0.00
- Replay For Safety (2021)0.00
- Large Batch Experience Replay (2021)0.00
- Frugal Actor-critic: Sample Efficient Off-policy Deep Reinforcement Learning Using Unique Experiences (2024)0.00
- Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay (2021)0.00
- Remember And Forget For Experience Replay (2018)0.00
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00