Policy Gradients For Cumulative Prospect Theory In Reinforcement Learning
2024 Β· Olivier Lepel, Anas Barakat
Abstract
We derive a policy gradient theorem for Cumulative Prospect Theory (CPT) objectives in finite-horizon Reinforcement Learning (RL), generalizing the standard policy gradient theorem and encompassing distortion-based risk objectives as special cases. Motivated by behavioral economics, CPT combines an asymmetric utility transformation around a reference point with probability distortion. Building on our theorem, we design a first-order policy gradient algorithm for CPT-RL using a Monte Carlo gradient estimator based on order statistics. We establish statistical guarantees for the estimator and prove asymptotic convergence of the resulting algorithm to first-order stationary points of the (generally non-convex) CPT objective. Simulations illustrate qualitative behaviors induced by CPT and compare our first-order approach to existing zeroth-order methods.
Authors
(none)
Tags
Stats
Related papers
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00
- Policy Gradient Methods For Distortion Risk Measures (2021)0.00
- Last-iterate Global Convergence Of Policy Gradients For Constrained Reinforcement Learning (2024)0.00
- Reinforcement Learning Beyond Expectation (2021)5.84
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- Policy Gradient For Reinforcement Learning With General Utilities (2022)0.00
- Why Policy Gradient Algorithms Work For Undiscounted Total-reward Mdps (2025)0.00
- On The Global Optimality Of Policy Gradient Methods In General Utility Reinforcement Learning (2024)0.00