Monte Carlo Augmented Actor-critic For Sparse Reward Deep Reinforcement Learning From Suboptimal Demonstrations
2022 Β· Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, et al.
Abstract
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions. This sparsity poses new exploration challenges. One common way to address this problem is using demonstrations to provide initial signal about regions of the state space with high rewards. However, prior RL from demonstrations algorithms introduce significant complexity and many hyperparameters, making them hard to implement and tune. We introduce Monte Carlo Augmented Actor Critic (MCAC), a parameter free modification to standard actor-critic algorithms which initializes the replay buffer with demonstrations and computes a modified \(Q\)-value by taking the maximum of the standard temporal distance (TD) target and a Monte Carlo estimate of the reward-to-go. This encourages exploration in the neighborhood of high-performing trajectories by encouraging high \(Q\)-values in corresponding re
Authors
(none)
Tags
Stats
Related papers
- Learning From Demonstrations With SACR2: Soft Actor-critic With Reward Relabeling (2021)0.00
- Effects Of Sparse Rewards Of Different Magnitudes In The Speed Of Learning Of Model-based Actor Critic Methods (2020)0.00
- Beyond Exponentially Fast Mixing In Average-reward Reinforcement Learning Via Multi-level Monte Carlo Actor-critic (2023)0.00
- Learning Long-term Reward Redistribution Via Randomized Return Decomposition (2021)0.00
- Boosting Exploration In Actor-critic Algorithms By Incentivizing Plausible Novel States (2022)5.24
- A Sharper Global Convergence Analysis For Average Reward Reinforcement Learning Via An Actor-critic Approach (2024)0.00
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- Distributional Soft Actor-critic: Off-policy Reinforcement Learning For Addressing Value Estimation Errors (2020)17.77