Hierarchical Reinforcement Learning Via Advantage-weighted Information Maximization
2019 Β· Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
Abstract
Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent variable of a hierarchical policy using mutual information maximization. Our approach can be interpreted as a way to learn a discrete and latent representation of the state-action space. To learn option policies that correspond to modes of the advantage function, we introduce advantage-weighted importance sampling. In our HRL method, the gating policy learns to select option policies based on an option-value function, and these option policies are optimized based on the deterministic policy gradient method. This framework is derived by leveraging the analogy between a monolithic policy in stan
Authors
(none)
Tags
Stats
Related papers
- Hierarchical Reinforcement Learning With Advantage-based Auxiliary Rewards (2019)0.00
- Hierarchical Decision Making Based On Structural Information Principles (2024)0.00
- Boosting Hierarchical Reinforcement Learning With Meta-learning For Complex Task Adaptation (2024)0.00
- A Provably Efficient Option-based Algorithm For Both High-level And Low-level Learning (2024)0.00
- Bidirectional-reachable Hierarchical Reinforcement Learning With Mutually Responsive Policies (2024)0.00
- Deep Reinforcement Learning From Hierarchical Preference Design (2023)2.00
- Offline Hierarchical Reinforcement Learning Via Inverse Optimization (2024)0.00
- Learning And Exploiting Multiple Subgoals For Fast Exploration In Hierarchical Reinforcement Learning (2019)0.00