Efficient Reinforcement Learning From Demonstration Using Local Ensemble And Reparameterization With Split And Merge Of Expert Policies
2022 Β· Yu Wang, Fang Liu
Abstract
The current work on reinforcement learning (RL) from demonstrations often assumes the demonstrations are samples from an optimal policy, an unrealistic assumption in practice. When demonstrations are generated by sub-optimal policies or have sparse state-action pairs, policy learned from sub-optimal demonstrations may mislead an agent with incorrect or non-local action decisions. We propose a new method called Local Ensemble and Reparameterization with Split and Merge of expert policies (LEARN-SAM) to improve efficiency and make better use of the sub-optimal demonstrations. First, LEARN-SAM employs a new concept, the lambda-function, based on a discrepancy measure between the current state to demonstrated states to "localize" the weights of the expert policies during learning. Second, LEARN-SAM employs a split-and-merge (SAM) mechanism by separating the helpful parts in each expert demonstration and regrouping them into new expert policies to use the demonstrations selectively. Both th
Authors
(none)
Tags
Stats
Related papers
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- SEERL: Sample Efficient Ensemble Reinforcement Learning (2020)2.26
- Optimizing Neurorobot Policy Under Limited Demonstration Data Through Preference Regret (2026)0.00
- Learning Safe Policies With Expert Guidance (2018)0.00
- Learning From Demonstrations With SACR2: Soft Actor-critic With Reward Relabeling (2021)0.00
- Reverse Forward Curriculum Learning For Extreme Sample And Demonstration Efficiency In Reinforcement Learning (2024)0.00
- Periodic Intra-ensemble Knowledge Distillation For Reinforcement Learning (2020)4.52
- Pretraining Deep Actor-critic Reinforcement Learning Algorithms With Expert Demonstrations (2018)0.00