BATS: Best Action Trajectory Stitching
2022 Β· Ian Char, Viraj Mehta, Adam Villaflor, et al.
Abstract
The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged data, trajectories from the MDP can be rolled out for long periods with confidence. We prove that this pr
Authors
(none)
Tags
Stats
Related papers
- Model-based Trajectory Stitching For Improved Offline Reinforcement Learning (2022)0.00
- Diffstitch: Boosting Offline Reinforcement Learning With Diffusion-based Trajectory Stitching (2024)0.00
- Harnessing Mixed Offline Reinforcement Learning Datasets Via Trajectory Weighting (2023)0.00
- Offline RL With Observation Histories: Analyzing And Improving Sample Complexity (2023)0.00
- Learning From Good Trajectories In Offline Multi-agent Reinforcement Learning (2022)5.24
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00
- Exploiting Action Impact Regularity And Exogenous State Variables For Offline Reinforcement Learning (2021)0.00
- Constrained Latent Action Policies For Model-based Offline Reinforcement Learning (2024)0.00