Planning And Learning In Average Risk-aware Mdps
2025 Β· Weikai Wang, Erick Delage
Abstract
For continuing tasks, average cost Markov decision processes have well-documented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Q-learning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.
Authors
(none)
Tags
Stats
Related papers
- Learning And Planning In Average-reward Markov Decision Processes (2020)0.00
- Efficient Algorithms For Mitigating Uncertainty And Risk In Reinforcement Learning (2025)0.00
- Bayesian Risk-sensitive Policy Optimization For Mdps With General Loss Functions (2025)0.00
- Decentralised Q-learning For Multi-agent Markov Decision Processes With A Satisfiability Criterion (2023)0.00
- Stochastic First-order Methods For Average-reward Markov Decision Processes (2022)3.58
- Efficient Learning For Entropy-regularized Markov Decision Processes Via Multilevel Monte Carlo (2025)0.00
- Learning And Planning For Time-varying Mdps Using Maximum Likelihood Estimation (2019)0.00
- Burning RED: Unlocking Subtask-driven Reinforcement Learning And Risk-awareness In Average-reward Markov Decision Processes (2024)0.00