Reinforcement Learning With Sparse Rewards Using Guidance From Offline Demonstration
2022 Β· Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, et al.
Abstract
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy improvement step with an additional policy guidance step by using the offline demonstration data. The key
Authors
(none)
Tags
Stats
Related papers
- Hundreds Guide Millions: Adaptive Offline Reinforcement Learning With Expert Guidance (2023)7.50
- Enhancing Online Reinforcement Learning With Meta-learned Objective From Offline Data (2025)0.00
- Guided Online Distillation: Promoting Safe Reinforcement Learning By Offline Demonstration (2023)4.52
- Viva: Video-trained Value Functions For Guiding Online RL From Diverse Data (2025)0.00
- Don't Change The Algorithm, Change The Data: Exploratory Data For Offline Reinforcement Learning (2022)0.00
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56