Distributional Method For Risk Averse Reinforcement Learning
2023 Β· Ziteng Cheng, Sebastian Jaimungal, Nick Martin
Abstract
We introduce a distributional method for learning the optimal policy in risk averse Markov decision process with finite state action spaces, latent costs, and stationary dynamics. We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures constructed from nested Kusuoka-type conditional risk mappings. For such performance criteria, randomized policies may outperform deterministic policies, therefore, the candidate policies lie in the d-dimensional simplex where d is the cardinality of the action space. Existing risk averse reinforcement learning methods seldom concern randomized policies, na\"ive extensions to current setting suffer from the curse of dimensionality. By exploiting certain structures embedded in the corresponding dynamic programming principle, we propose a distributional learning method for seeking the optimal policy. The conditional distribution of the value function is casted into a specific type o
Authors
(none)
Tags
Stats
Related papers
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- Pitfall Of Optimism: Distributional Reinforcement Learning By Randomizing Risk Criterion (2023)0.00
- Conjugated Discrete Distributions For Distributional Reinforcement Learning (2021)0.00
- Distributionally Robust Model-based Reinforcement Learning With Large State Spaces (2023)0.00
- Risk Aware And Multi-objective Decision Making With Distributional Monte Carlo Tree Search (2021)0.00
- On The Foundation Of Distributionally Robust Reinforcement Learning (2023)0.00
- Distributional Soft Actor-critic With Diffusion Policy (2025)0.00
- Improving Robustness Via Risk Averse Distributional Reinforcement Learning (2020)0.00