Behavior Prior Representation Learning For Offline Reinforcement Learning
2022 Β· Hongyu Zang, Xin Li, Jie Yu, et al.
Abstract
Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR carries out performance guarantees when integrated into algorithms that have either policy improvement guarantees (conservative algorithms) or produce lower bounds of the policy values (pessimistic algorithms). Empirically, we show
Authors
(none)
Tags
Stats
Related papers
- Representation Matters: Offline Pretraining For Sequential Decision Making (2021)0.00
- Know Your Boundaries: The Necessity Of Explicit Behavioral Cloning In Offline RL (2022)0.00
- When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning? (2022)0.00
- Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization (2020)0.00
- Behavioral Priors And Dynamics Models: Improving Performance And Domain Transfer In Offline RL (2021)0.00
- BRAC+: Improved Behavior Regularized Actor Critic For Offline Reinforcement Learning (2021)0.00
- Behavior Estimation From Multi-source Data For Offline Reinforcement Learning (2022)2.26
- Adaptive Behavior Cloning Regularization For Stable Offline-to-online Reinforcement Learning (2022)8.09