Bootstrap Advantage Estimation For Policy Optimization In Reinforcement Learning
2022 Β· Md Masudur Rahman, Yexiang Xue
Abstract
This paper proposes an advantage estimation approach based on data augmentation for policy optimization. Unlike using data augmentation on the input to learn value and policy function as existing methods use, our method uses data augmentation to compute a bootstrap advantage estimation. This Bootstrap Advantage Estimation (BAE) is then used for learning and updating the gradient of policy and value function. To demonstrate the effectiveness of our approach, we conducted experiments on several environments. These environments are from three benchmarks: Procgen, Deepmind Control, and Pybullet, which include both image and vector-based observations; discrete and continuous action spaces. We observe that our method reduces the policy and the value loss better than the Generalized advantage estimation (GAE) method and eventually improves cumulative return. Furthermore, our method performs better than two recently proposed data augmentation techniques (RAD and DRAC). Overall, our method perf
Authors
(none)
Tags
Stats
Related papers
- Direct Advantage Estimation (2021)0.00
- Generalization Of Reinforcement Learning With Policy-aware Adversarial Data Augmentation (2021)0.00
- Divergence-augmented Policy Optimization (2025)0.00
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning (2021)9.23
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- Policy Augmentation: An Exploration Strategy For Faster Convergence Of Deep Reinforcement Learning Algorithms (2021)2.26
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Data Efficient Training For Reinforcement Learning With Adaptive Behavior Policy Sharing (2020)0.00