MEET: A Monte Carlo Exploration-exploitation Trade-off For Buffer Sampling
2022 Β· Julius Ott, Lorenzo Servadei, Jose Arjona-Medina, et al.
Abstract
Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task. To address this, this paper proposes a new sampling strategy that leverages the exploration-exploitation trade-off. This is enabled by the uncertainty estimation of the Q-Value function, which guides the sampling to explore more significant transitions and, thus, learn a more efficient policy. Experiments on classical control environments demonstrate stable results across various environments. They show that the proposed method outperforms state-of-the-art sampling strategies for dense rewards w.r.t. convergence and peak performance by 26% on av
Authors
(none)
Tags
Stats
Related papers
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- Large Batch Experience Replay (2021)0.00
- A Provably Efficient Sample Collection Strategy For Reinforcement Learning (2020)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Exploration Conscious Reinforcement Learning Revisited (2018)0.00
- Information-directed Exploration For Deep Reinforcement Learning (2018)0.00
- Frugal Actor-critic: Sample Efficient Off-policy Deep Reinforcement Learning Using Unique Experiences (2024)0.00