Reinforcement Learning With Markov Risk Measures And Multipattern Risk Approximation
2026 Β· Andrzej Ruszczynski, Tiangang Zhang
Abstract
arXiv:2605.00654v1 Announce Type: new Abstract: For a risk-averse finite-horizon Markov Decision Problem, we introduce a special class of Markov coherent risk measures, called mini-batch measures. We also define the class of multipattern risk-averse problems that generalizes the class of linear systems. We use both concepts in a feature-based \(Q\)-learning method with multipattern \(Q\)-factor approximation and we prove a high-probability regret bound of \(\mathcal\{O\}\big(H^2 N^H \sqrt\{ K\}\big)\), where \(H\) is the horizon, \(N\) is the mini-batch size, and \(K\) is the number of episodes. We also propose an economical version of the \(Q\)-learning method that streamlines the policy evaluation (backward) step. The theoretical results are illustrated on a stochastic assignment problem and a short-horizon multi-armed bandit problem.
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Online Bayesian Risk-averse Reinforcement Learning (2025)0.00
- Risk Bounds And Rademacher Complexity In Batch Reinforcement Learning (2021)0.00
- Conditionally Elicitable Dynamic Risk Measures For Deep Reinforcement Learning (2022)0.00
- Distributional Method For Risk Averse Reinforcement Learning (2023)0.00
- Taming Equilibrium Bias In Risk-sensitive Multi-agent Reinforcement Learning (2024)0.00
- A Policy Gradient Approach For Optimization Of Smooth Risk Measures (2022)0.00
- Non-stationary Risk-sensitive Reinforcement Learning: Near-optimal Dynamic Regret, Adaptive Detection, And Separation Design (2022)3.58