A Policy-guided Imitation Approach For Offline Reinforcement Learning
2022 Β· Haoran Xu, Li Jiang, Jianxiong Li, et al.
Abstract
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution generalization but suffer from erroneous off-policy evaluation. Imitation-based methods avoid off-policy evaluation but are too conservative to surpass the dataset. In this study, we propose an alternative approach, inheriting the training stability of imitation-style methods while still allowing logical out-of-distribution generalization. We decompose the conventional reward-maximizing policy in offline RL into a guide-policy and an execute-policy. During training, the guide-poicy and execute-policy are learned using only data from the dataset, in a supervised and decoupled manner. During evaluation, the guide-policy guides the execute-policy by telling where it should go so that the reward can be maximized, serving as the \textit\{Prophet\}. By doing so, our algorithm allows \textit\{state-compositionality
Authors
(none)
Tags
Stats
Related papers
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- Curriculum Offline Imitation Learning (2021)0.00
- Dual RL: Unification And New Methods For Reinforcement And Imitation Learning (2023)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Efficient Offline Reinforcement Learning: First Imitate, Then Improve (2024)1.91
- A Non-monolithic Policy Approach Of Offline-to-online Reinforcement Learning (2024)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Representation Matters: Offline Pretraining For Sequential Decision Making (2021)0.00