Policy Gradient Methods For Distortion Risk Measures
2021 Β· Nithia Vijayan, Prashanth L. A
Abstract
We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respectively. We derive a variant of the policy gradient theorem that caters to the DRM objective, and integrate it with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.
Authors
(none)
Tags
Stats
Related papers
- Policy Newton Methods For Distortion Riskmetrics (2025)0.00
- A Policy Gradient Approach For Optimization Of Smooth Risk Measures (2022)0.00
- Policy Gradients For Cumulative Prospect Theory In Reinforcement Learning (2024)0.00
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00
- Lrt-diffusion: Calibrated Risk-aware Guidance For Diffusion Policies (2025)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00