Frugal Actor-critic: Sample Efficient Off-policy Deep Reinforcement Learning Using Unique Experiences
2024 Β· Nikhil Kumar Singh, Indranil Saha
Abstract
Efficient utilization of the replay buffer plays a significant role in the off-policy actor-critic reinforcement learning (RL) algorithms used for model-free control policy synthesis for complex dynamical systems. We propose a method for achieving sample efficiency, which focuses on selecting unique samples and adding them to the replay buffer during the exploration with the goal of reducing the buffer size and maintaining the independent and identically distributed (IID) nature of the samples. Our method is based on selecting an important subset of the set of state variables from the experiences encountered during the initial phase of random exploration, partitioning the state space into a set of abstract states based on the selected important state variables, and finally selecting the experiences with unique state-reward combination by using a kernel density estimator. We formally prove that the off-policy actor-critic algorithm incorporating the proposed method for unique experience
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Actor-critic With Experience Replay (2016)0.00
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60
- Decoupled Exploration And Exploitation Policies For Sample-efficient Reinforcement Learning (2021)0.00
- CUER: Corrected Uniform Experience Replay For Off-policy Continuous Deep Reinforcement Learning Algorithms (2024)0.00
- Stratified Experience Replay: Correcting Multiplicity Bias In Off-policy Reinforcement Learning (2021)0.00
- Replay For Safety (2021)0.00
- Large Batch Experience Replay (2021)0.00
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00