Clipped Action Policy Gradient
2018 Β· Yasuhiro Fujita, Shin-Ichi Maeda
Abstract
Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.
Authors
(none)
Tags
Stats
Code
Related papers
- Marginal Policy Gradients: A Unified Family Of Estimators For Bounded Action Spaces With Applications (2018)0.00
- Variance Reduction For Policy Gradient With Action-dependent Factorized Baselines (2018)0.00
- Action-depedent Control Variates For Policy Optimization Via Stein's Identity (2017)0.00
- All-action Policy Gradient Methods: A Numerical Integration Approach (2019)0.00
- Trajectory-wise Control Variates For Variance Reduction In Policy Gradient Methods (2019)0.00
- On Many-actions Policy Gradient (2022)0.00
- Off-oab: Off-policy Policy Gradient Method With Optimal Action-dependent Baseline (2024)0.00
- Return Capping: Sample-efficient Cvar Policy Gradient Optimisation (2025)0.00