Offline Trajectory Optimization For Offline Reinforcement Learning
2024 Β· Ziqi Zhao, Zhaochun Ren, Liu Yang, et al.
Abstract
Offline reinforcement learning (RL) aims to learn policies without online explorations. To enlarge the training data, model-based offline RL learns a dynamics model which is utilized as a virtual environment to generate simulation data and enhance policy learning. However, existing data augmentation methods for offline RL suffer from (i) trivial improvement from short-horizon simulation; and (ii) the lack of evaluation and correction for generated data, leading to low-qualified augmentation. In this paper, we propose offline trajectory optimization for offline reinforcement learning (OTTO). The key motivation is to conduct long-horizon simulation and then utilize model uncertainty to evaluate and correct the augmented data. Specifically, we propose an ensemble of Transformers, a.k.a. World Transformers, to predict environment state dynamics and the reward function. Three strategies are proposed to use World Transformers to generate long-horizon trajectory simulation by perturbing the
Authors
(none)
Tags
Stats
Related papers
- Model-based Trajectory Stitching For Improved Offline Reinforcement Learning (2022)0.00
- GTA: Generative Trajectory Augmentation With Guidance For Offline Reinforcement Learning (2024)6.62
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- Planning To Go Out-of-distribution In Offline-to-online Reinforcement Learning (2023)0.00
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00
- Model-based Offline Reinforcement Learning With Reliability-guaranteed Sequence Modeling (2025)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- TEA: Trajectory Encoding Augmentation For Robust And Transferable Policies In Offline Reinforcement Learning (2024)0.00