Replay For Safety
2021 Β· Liran Szlak, Ohad Shamir
Abstract
Experience replay \citep\{lin1993reinforcement, mnih2015human\} is a widely used technique to achieve efficient use of data and improved performance in RL algorithms. In experience replay, past transitions are stored in a memory buffer and re-used during learning. Various suggestions for sampling schemes from the replay buffer have been suggested in previous works, attempting to optimally choose those experiences which will most contribute to the convergence to an optimal policy. Here, we give some conditions on the replay sampling scheme that will ensure convergence, focusing on the well-known Q-learning algorithm in the tabular setting. After establishing sufficient conditions for convergence, we turn to suggest a slightly different usage for experience replay - replaying memories in a biased manner as a means to change the properties of the resulting policy. We initiate a rigorous study of experience replay as a tool to control and modify the properties of the resulting policy. In p
Authors
(none)
Tags
Stats
Related papers
- Convergence Results For Q-learning With Experience Replay (2021)0.00
- Experience Replay Using Transition Sequences (2017)8.82
- Introspective Experience Replay: Look Back When Surprised (2022)0.00
- Large Batch Experience Replay (2021)0.00
- A Deeper Look At Experience Replay (2017)0.00
- Remember And Forget For Experience Replay (2018)0.00
- On The Convergence Of Experience Replay In Policy Optimization: Characterizing Bias, Variance, And Finite-time Convergence (2021)0.00
- CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms (2024)0.00