Offline RL With Observation Histories: Analyzing And Improving Sample Complexity
2023 Β· Joey Hong, Anca Dragan, Sergey Levine
Abstract
Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a dataset consisting only of suboptimal trials. One way that this can happen is by "stitching" together the best parts of otherwise suboptimal trajectories that overlap on similar states, to create new behaviors where each individual state is in-distribution, but the overall returns are higher. However, in many interesting and complex applications, such as autonomous navigation and dialogue systems, the state is partially observed. Even worse, the state representation is unknown or not easy to define. In such cases, policies and value functions are often conditioned on observation histories instead of states. In these cases, it is not clear if the same kind of "stitching" is feasible at the level of observation histories, since two different trajectories would always have different histories, and thus "similar states" that might lead to effective stitching cannot be leveraged. Theoretically, we s
Authors
(none)
Tags
Stats
Related papers
- Model-based Trajectory Stitching For Improved Offline Reinforcement Learning (2022)0.00
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Diffstitch: Boosting Offline Reinforcement Learning With Diffusion-based Trajectory Stitching (2024)0.00
- On Sample-efficient Offline Reinforcement Learning: Data Diversity, Posterior Sampling, And Beyond (2024)0.00
- A Dataset Perspective On Offline Reinforcement Learning (2021)0.00