Conditionally Elicitable Dynamic Risk Measures For Deep Reinforcement Learning
2022 Β· Anthony Coache, Sebastian Jaimungal, Γlvaro Cartea
Abstract
We propose a novel framework to solve risk-sensitive reinforcement learning (RL) problems where the agent optimises time-consistent dynamic spectral risk measures. Based on the notion of conditional elicitability, our methodology constructs (strictly consistent) scoring functions that are used as penalizers in the estimation procedure. Our contribution is threefold: we (i) devise an efficient approach to estimate a class of dynamic spectral risk measures with deep neural networks, (ii) prove that these dynamic spectral risk measures may be approximated to any arbitrary accuracy using deep neural networks, and (iii) develop a risk-sensitive actor-critic algorithm that uses full episodes and does not require any additional nested transitions. We compare our conceptually improved reinforcement learning algorithm with the nested simulation approach and illustrate its performance in two settings: statistical arbitrage and portfolio allocation on both simulated and real data.
Authors
(none)
Tags
Stats
Related papers
- Robust Bayesian Dynamic Programming For On-policy Risk-sensitive Reinforcement Learning (2025)0.00
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Ergodic Risk Measures: Towards A Risk-aware Foundation For Continual Reinforcement Learning (2025)0.00
- Epistemic Risk-sensitive Reinforcement Learning (2019)0.00
- Provably Efficient Iterated Cvar Reinforcement Learning With Function Approximation And Human Feedback (2023)0.00
- A Policy Gradient Approach For Optimization Of Smooth Risk Measures (2022)0.00
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Bridging Distributional And Risk-sensitive Reinforcement Learning With Provable Regret Bounds (2022)0.00