Bayesian Risk-sensitive Policy Optimization For Mdps With General Loss Functions
2025 Β· Xiaoshuang Wang, Yifan Lin, Enlu Zhou
Abstract
Motivated by many application problems, we consider Markov decision processes (MDPs) with a general loss function and unknown parameters. To mitigate the epistemic uncertainty associated with unknown parameters, we take a Bayesian approach to estimate the parameters from data and impose a coherent risk functional (with respect to the Bayesian posterior distribution) on the loss. Since this formulation usually does not satisfy the interchangeability principle, it does not admit Bellman equations and cannot be solved by approaches based on dynamic programming. Therefore, We propose a policy gradient optimization method, leveraging the dual representation of coherent risk measures and extending the envelope theorem to continuous cases. We then show the stationary analysis of the algorithm with a convergence rate of \(\mathcal\{O\}(T^\{-1/2\}+r^\{-1/2\})\), where \(T\) is the number of policy gradient iterations and \(r\) is the sample size of the gradient estimator. We further extend our
Authors
(none)
Tags
Stats
Related papers
- Offline Bayesian Aleatoric And Epistemic Uncertainty Quantification And Posterior Value Optimisation In Finite-state Mdps (2024)0.95
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Entropic Risk Optimization In Discounted Mdps: Sample Complexity Bounds With A Generative Model (2025)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Robust Bayesian Dynamic Programming For On-policy Risk-sensitive Reinforcement Learning (2025)0.00
- Bayesian Policy Optimization For Model Uncertainty (2018)0.00
- Optimistic Policy Optimization Is Provably Efficient In Non-stationary Mdps (2021)0.00
- Policy Optimization For Constrained Mdps With Provable Fast Global Convergence (2021)0.00