Bridging Distributional And Risk-sensitive Reinforcement Learning With Provable Regret Bounds
2022 Β· Hao Liang, Zhi-Quan Luo
Abstract
We study the regret guarantee for risk-sensitive reinforcement learning (RSRL) via distributional reinforcement learning (DRL) methods. In particular, we consider finite episodic Markov decision processes whose objective is the entropic risk measure (EntRM) of return. By leveraging a key property of the EntRM, the independence property, we establish the risk-sensitive distributional dynamic programming framework. We then propose two novel DRL algorithms that implement optimism through two different schemes, including a model-free one and a model-based one. We prove that they both attain \(\tilde\{\mathcal\{O\}\}(\frac\{\exp(|\beta| H)-1\}\{|\beta|\}H\sqrt\{S^2AK\})\) regret upper bound, where \(S\), \(A\), \(K\), and \(H\) represent the number of states, actions, episodes, and the time horizon, respectively. It matches RSVI2 proposed in \cite\{fei2021exponential\}, with novel distributional analysis. To the best of our knowledge, this is the first regret analysis that bridges DRL and
Authors
(none)
Tags
Stats
Related papers
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58
- Exponential Bellman Equation And Improved Regret Bounds For Risk-sensitive Reinforcement Learning (2021)0.00
- Robust Bayesian Dynamic Programming For On-policy Risk-sensitive Reinforcement Learning (2025)0.00
- DRL-ORA: Distributional Reinforcement Learning With Online Risk Adaption (2023)0.00
- Continuous-time Risk-sensitive Reinforcement Learning Via Quadratic Variation Penalty (2024)0.00
- Unified Framework Of Distributional Regret In Multi-armed Bandits And Reinforcement Learning (2026)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- A Risk-sensitive Approach To Policy Optimization (2022)3.58