A Risk-sensitive Approach To Policy Optimization
2022 Β· Jared Markowitz, Ryan W. Gardner, Ashley Llorens, et al.
Abstract
Standard deep reinforcement learning (DRL) aims to maximize expected reward, considering collected experiences equally in formulating a policy. This differs from human decision-making, where gains and losses are valued differently and outlying outcomes are given increased consideration. It also fails to capitalize on opportunities to improve safety and/or performance through the incorporation of distributional context. Several approaches to distributional DRL have been investigated, with one popular strategy being to evaluate the projected distribution of returns for possible actions. We propose a more direct approach whereby risk-sensitive objectives, specified in terms of the cumulative distribution function (CDF) of the distribution of full-episode rewards, are optimized. This approach allows for outcomes to be weighed based on relative quality, can be used for both continuous and discrete action spaces, and may naturally be applied in both constrained and unconstrained settings. We
Authors
(none)
Tags
Stats
Related papers
- Distributional Method For Risk Averse Reinforcement Learning (2023)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Pitfall Of Optimism: Distributional Reinforcement Learning By Randomizing Risk Criterion (2023)0.00
- DRL-ORA: Distributional Reinforcement Learning With Online Risk Adaption (2023)0.00
- Improving Robustness Via Risk Averse Distributional Reinforcement Learning (2020)0.00
- One Risk To Rule Them All: A Risk-sensitive Perspective On Model-based Offline Reinforcement Learning (2022)3.58
- Towards Safe Reinforcement Learning Via Constraining Conditional Value-at-risk (2022)0.00
- On The Foundation Of Distributionally Robust Reinforcement Learning (2023)0.00