On The Role Of Weight Sharing During Deep Option Learning
2019 Β· Matthew Riemer, Ignacio Cases, Clemens Rosenbaum, et al.
Abstract
The options framework is a popular approach for building temporally extended actions in reinforcement learning. In particular, the option-critic architecture provides general purpose policy gradient theorems for learning actions from scratch that are extended in time. However, past work makes the key assumption that each of the components of option-critic has independent parameters. In this work we note that while this key assumption of the policy gradient theorems of option-critic holds in the tabular case, it is always violated in practice for the deep function approximation setting. We thus reconsider this assumption and consider more general extensions of option-critic and hierarchical option-critic training that optimize for the full architecture with each update. It turns out that not assuming parameter independence challenges a belief in prior work that training the policy over options can be disentangled from the dynamics of the underlying options. In fact, learning can be sped
Authors
(none)
Tags
Stats
Related papers
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Classifying Options For Deep Reinforcement Learning (2016)0.00
- Interpretable Option Discovery Using Deep Q-learning And Variational Autoencoders (2022)0.00
- Reusable Options Through Gradient-based Meta Learning (2022)0.00
- Attention Option-critic (2022)0.00
- Discovering Hierarchies Using Imitation Learning From Hierarchy Aware Policies (2018)0.00
- SOAP-RL: Sequential Option Advantage Propagation For Reinforcement Learning In POMDP Environments (2024)0.00
- Optimal Options For Multi-task Reinforcement Learning Under Time Constraints (2020)0.00