A Policy Gradient Approach For Optimization Of Smooth Risk Measures
2022 Β· Nithia Vijayan, Prashanth L. A
Abstract
We propose policy gradient algorithms for solving a risk-sensitive reinforcement learning (RL) problem in on-policy as well as off-policy settings. We consider episodic Markov decision processes, and model the risk using the broad class of smooth risk measures of the cumulative discounted reward. We propose two template policy gradient algorithms that optimize a smooth risk measure in on-policy and off-policy RL settings, respectively. We derive non-asymptotic bounds that quantify the rate of convergence of our proposed algorithms to a stationary point of the smooth risk measure. As special cases, we establish that our algorithms apply to optimization of mean-variance and distortion risk measures, respectively.
Authors
(none)
Tags
Stats
Related papers
- Policy Gradient Methods For Distortion Risk Measures (2021)0.00
- Smoothed Functional-based Gradient Algorithms For Off-policy Reinforcement Learning: A Non-asymptotic Viewpoint (2021)5.84
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Conditionally Elicitable Dynamic Risk Measures For Deep Reinforcement Learning (2022)0.00
- An Alternative To Variance: Gini Deviation For Risk-averse Policy Gradient (2023)2.26
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Risk-sensitive Reinforcement Learning With Exponential Criteria (2022)0.00