Online Bayesian Risk-averse Reinforcement Learning
2025 Β· Yuhao Wang, Enlu Zhou
Abstract
In this paper, we study the Bayesian risk-averse formulation in reinforcement learning (RL). To address the epistemic uncertainty due to a lack of data, we adopt the Bayesian Risk Markov Decision Process (BRMDP) to account for the parameter uncertainty of the unknown underlying model. We derive the asymptotic normality that characterizes the difference between the Bayesian risk value function and the original value function under the true unknown distribution. The results indicate that the Bayesian risk-averse approach tends to pessimistically underestimate the original value function. This discrepancy increases with stronger risk aversion and decreases as more data become available. We then utilize this adaptive property in the setting of online RL as well as online contextual multi-arm bandits (CMAB), a special case of online RL. We provide two procedures using posterior sampling for both the general RL problem and the CMAB problem. We establish a sub-linear regret bound, with the re
Authors
(none)
Tags
Stats
Related papers
- Robust Bayesian Dynamic Programming For On-policy Risk-sensitive Reinforcement Learning (2025)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58
- Long-horizon Model-based Offline Reinforcement Learning Without Conservatism (2025)0.00
- Bayesian Risk-averse Q-learning With Streaming Observations (2023)0.00
- Generalized Bayesian Deep Reinforcement Learning (2024)0.00
- Epistemic Risk-sensitive Reinforcement Learning (2019)0.00
- DRL-ORA: Distributional Reinforcement Learning With Online Risk Adaption (2023)0.00
- A Bayesian Approach To Robust Inverse Reinforcement Learning (2023)0.00