Learning From Demonstrations With SACR2: Soft Actor-critic With Reward Relabeling
2021 Β· Jesus Bujalance Martin, Raphael Chekroun, Fabien Moutarde
Abstract
During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data: the agent is exposed to successful states and actions early on, which can accelerate the learning process and improve performance. In the past, multiple ideas have been proposed to make good use of the demonstrations in the buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We carry on a study to evaluate several of these ideas in isolation, to see which of them have the most significant impact. We also present a new method for sparse-reward tasks, based on a reward bonus given to demonstrations and successful episodes. First, we give a rewa
Authors
(none)
Tags
Stats
Related papers
- Monte Carlo Augmented Actor-critic For Sparse Reward Deep Reinforcement Learning From Suboptimal Demonstrations (2022)0.00
- Boosting Soft Actor-critic: Emphasizing Recent Experience Without Forgetting The Past (2019)0.00
- Learning Without Time-based Embodiment Resets In Soft-actor Critic (2025)0.00
- DSAC: Distributional Soft Actor-critic For Risk-sensitive Reinforcement Learning (2020)7.81
- Revisiting Discrete Soft Actor-critic (2022)0.00
- DR-SAC: Distributionally Robust Soft Actor-critic For Reinforcement Learning Under Uncertainty (2025)0.00
- Discriminator Soft Actor Critic Without Extrinsic Rewards (2020)3.58
- Improved Soft Actor-critic: Mixing Prioritized Off-policy Samples With On-policy Experience (2021)0.00