Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting
2023 Β· Zhang-Wei Hong, Pulkit Agrawal, RΓ©mi Tachet Des Combes, et al.
Abstract
Most offline reinforcement learning (RL) algorithms return a target policy maximizing a trade-off between (1) the expected performance gain over the behavior policy that collected the dataset, and (2) the risk stemming from the out-of-distribution-ness of the induced state-action occupancy. It follows that the performance of the target policy is strongly related to the performance of the behavior policy and, thus, the trajectory return distribution of the dataset. We show that in mixed datasets consisting of mostly low-return trajectories and minor high-return trajectories, state-of-the-art offline RL algorithms are overly restrained by low-return trajectories and fail to exploit high-performing trajectories to the fullest. To overcome this issue, we show that, in deterministic MDPs with stochastic initial states, the dataset sampling can be re-weighted to induce an artificial dataset whose behavior policy has a higher return. This re-weighted sampling strategy may be combined with any
Authors
(none)
Tags
Stats
Related papers
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00
- In-dataset Trajectory Return Regularization For Offline Preference-based Reinforcement Learning (2024)0.00
- Provably Efficient Offline Reinforcement Learning With Trajectory-wise Reward (2022)0.00
- Model-based Trajectory Stitching For Improved Offline Reinforcement Learning (2022)0.00
- Enhancing Offline Reinforcement Learning With Curriculum Learning-based Trajectory Valuation (2025)0.00
- Prioritized Trajectory Replay: A Replay Memory For Data-driven Reinforcement Learning (2023)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77