An Alternative To Variance: Gini Deviation For Risk-averse Policy Gradient
2023 Β· Yudong Luo, Guiliang Liu, Pascal Poupart, et al.
Abstract
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
Authors
(none)
Tags
Stats
Related papers
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00
- A Policy Gradient Approach For Optimization Of Smooth Risk Measures (2022)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- An Analysis Of Measure-valued Derivatives For Policy Gradients (2022)2.26
- Shrinking The Variance: Shrinkage Baselines For Reinforcement Learning With Verifiable Rewards (2025)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- An Empirical Analysis Of Measure-valued Derivatives For Policy Gradients (2021)0.00
- Policy Gradient Methods For Distortion Risk Measures (2021)0.00