Pitfall Of Optimism: Distributional Reinforcement Learning By Randomizing Risk Criterion
2023 Β· Taehyun Cho, Seungyub Han, Heesoo Lee, et al.
Abstract
Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning algorithm that selects actions by randomizing risk criterion to avoid one-sided tendency on risk. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure and prove the convergence and optimality of the proposed method with the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including Atari 55 games.
Authors
(none)
Tags
Stats
Related papers
- Toward Risk-based Optimistic Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Distributional Method For Risk Averse Reinforcement Learning (2023)0.00
- Off-policy Reinforcement Learning With Optimistic Exploration And Distribution Correction (2021)0.00
- A Risk-sensitive Approach To Policy Optimization (2022)3.58
- On Optimistic Versus Randomized Exploration In Reinforcement Learning (2017)0.00
- Improving Robustness Via Risk Averse Distributional Reinforcement Learning (2020)0.00
- A Distributional Perspective On Reinforcement Learning (2017)0.00
- Optimism As Risk-seeking In Multi-agent Reinforcement Learning (2025)0.00