Policy Gradient From Demonstration And Curiosity
2020 Β· Jie Chen, Wenjun Xu
Abstract
With reinforcement learning, an agent could learn complex behaviors from high-level abstractions of the task. However, exploration and reward shaping remained challenging for existing methods, especially in scenarios where the extrinsic feedback was sparse. Expert demonstrations have been investigated to solve these difficulties, but a tremendous number of high-quality demonstrations were usually required. In this work, an integrated policy gradient algorithm was proposed to boost exploration and facilitate intrinsic reward learning from only limited number of demonstrations. We achieved this by reformulating the original reward function with two additional terms, where the first term measured the Jensen-Shannon divergence between current policy and the expert, and the second term estimated the agent's uncertainty about the environment. The presented algorithm was evaluated on a range of simulated tasks with sparse extrinsic reward signals where only one single demonstrated trajectory
Authors
(none)
Tags
Stats
Related papers
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Model-free Policy Learning With Reward Gradients (2021)0.00
- Reward-conditioned Policies (2019)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Learning Safe Policies With Expert Guidance (2018)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Behind The Myth Of Exploration In Policy Gradients (2024)0.00