Offline Hierarchical Reinforcement Learning Via Inverse Optimization
2024 Β· Carolin Schmidt, Daniele Gammelli, James Harrison, et al.
Abstract
Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose OHIO: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the \textit\{inverse problem\}, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Reinforcement Learning In Complex 3D Environments (2023)0.00
- HIPODE: Enhancing Offline Reinforcement Learning With High-quality Synthetic Data From A Policy-decoupled Approach (2023)0.00
- Latent Space Policies For Hierarchical Reinforcement Learning (2018)0.00
- Bidirectional-reachable Hierarchical Reinforcement Learning With Mutually Responsive Policies (2024)0.00
- Hierarchical Reinforcement Learning With Advantage-based Auxiliary Rewards (2019)0.00
- Hypercube Policy Regularization Framework For Offline Reinforcement Learning (2024)0.00
- When To Trust Your Simulator: Dynamics-aware Hybrid Offline-and-online Reinforcement Learning (2022)2.26
- Multi-horizon Representations With Hierarchical Forward Models For Reinforcement Learning (2022)0.00