Large Batch Experience Replay
2021 Β· Thibault Lahire, Matthieu Geist, Emmanuel Rachelson
Abstract
Several algorithms have been proposed to sample non-uniformly the replay buffer of deep Reinforcement Learning (RL) agents to speed-up learning, but very few theoretical foundations of these sampling schemes have been provided. Among others, Prioritized Experience Replay appears as a hyperparameter sensitive heuristic, even though it can provide good performance. In this work, we cast the replay buffer sampling problem as an importance sampling one for estimating the gradient. This allows deriving the theoretically optimal sampling distribution, yielding the best theoretical convergence speed. Elaborating on the knowledge of the ideal sampling scheme, we exhibit new theoretical foundations of Prioritized Experience Replay. The optimal sampling distribution being intractable, we make several approximations providing good results in practice and introduce, among others, LaBER (Large Batch Experience Replay), an easy-to-code and efficient method for sampling the replay buffer. LaBER, whic
Authors
(none)
Tags
Stats
Related papers
- Off-policy Correction For Deep Deterministic Policy Gradient Algorithms Via Batch Prioritized Experience Replay (2021)0.00
- Replay For Safety (2021)0.00
- A Deeper Look At Experience Replay (2017)0.00
- Introspective Experience Replay: Look Back When Surprised (2022)0.00
- Regret Minimization Experience Replay In Off-policy Reinforcement Learning (2021)0.00
- CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms (2024)0.00
- Prioritized Generative Replay (2024)0.00
- Learning To Sample With Local And Global Contexts In Experience Replay Buffer (2020)0.00