Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets
2023 Β· Zhang-Wei Hong, Aviral Kumar, Sathwik Karnik, et al.
Abstract
Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in t
Authors
(none)
Tags
Stats
Related papers
- Offline Reinforcement Learning With Imbalanced Datasets (2023)0.00
- Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting (2023)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- An Optimistic Perspective On Offline Reinforcement Learning (2019)0.00
- Finetuning From Offline Reinforcement Learning: Challenges, Trade-offs And Practical Solutions (2023)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00