Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments
2023 · Alain Andres, Lukas Schäfer, Stefano V. Albrecht, et al.
Abstract
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently d
Authors
(none)
Tags
Stats
Related papers
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Bridging The Gap Between Offline And Online Reinforcement Learning Evaluation Methodologies (2022)0.00
- Representation Matters: Offline Pretraining For Sequential Decision Making (2021)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83
- Offline Vs. Online Learning In Model-based RL: Lessons For Data Collection Strategies (2025)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- An Optimistic Perspective On Offline Reinforcement Learning (2019)0.00