Continuous-time Risk-sensitive Reinforcement Learning Via Quadratic Variation Penalty
2024 Β· Yanwei Jia
Abstract
This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (J Mach Learn Res 24(161): 1--61, 2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk sensitivity by adding the realized variance of the value process. Additionally, I highlight that the conventio
Authors
(none)
Tags
Stats
Related papers
- Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning (2022)4.52
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Exploration Versus Exploitation In Reinforcement Learning: A Stochastic Control Approach (2018)9.76
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Robust Reinforcement Learning Under Diffusion Models For Data With Jumps (2024)0.00
- Exponential Bellman Equation And Improved Regret Bounds For Risk-sensitive Reinforcement Learning (2021)0.00
- Q-learning In Continuous Time (2022)0.00
- Ergodic Risk Measures: Towards A Risk-aware Foundation For Continual Reinforcement Learning (2025)0.00