Return Capping: Sample-efficient Cvar Policy Gradient Optimisation
2025 Β· Harry Mead, Clarissa Costen, Bruno Lacerda, et al.
Abstract
When optimising for conditional value at risk (CVaR) using policy gradients (PG), current methods rely on discarding a large proportion of trajectories, resulting in poor sample efficiency. We propose a reformulation of the CVaR optimisation problem by capping the total return of trajectories used in training, rather than simply discarding them, and show that this is equivalent to the original problem if the cap is set appropriately. We show, with empirical results in an number of environments, that this reformulation of the problem results in consistently improved performance compared to baselines. We have made all our code available here: https://github.com/HarryMJMead/cvar-return-capping.
Authors
(none)
Tags
Stats
Code
Related papers
- A Simple Mixture Policy Parameterization For Improving Sample Efficiency Of Cvar Optimization (2024)0.00
- Trajectory-wise Control Variates For Variance Reduction In Policy Gradient Methods (2019)0.00
- Clipped Action Policy Gradient (2018)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- PAGE-PG: A Simple And Loopless Variance-reduced Policy Gradient Method With Probabilistic Gradient Estimation (2022)0.00
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00