Distributional Reinforcement Learning With Quantile Regression
2017 Β· Will Dabney, Mark Rowland, Marc G. Bellemare, et al.
Abstract
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. When sampled probabilistically, these state transitions, rewards, and actions can all induce randomness in the observed long-term return. Traditionally, reinforcement learning algorithms average over this randomness to estimate the value function. In this paper, we build on recent work advocating a distributional approach to reinforcement learning in which the distribution over returns is modeled explicitly instead of only estimating the mean. That is, we examine methods of learning the value distribution instead of the value function. We give results that close a number of gaps between the theoretical and algorithmic results given by Bellemare, Dabney, and Munos (2017). First, we extend existing results to the approximate distribution setting. Second, we present a novel distributional reinforcement learning algorithm consistent with our theoretical formulation.
Authors
(none)
Tags
Stats
Related papers
- Distributional Reinforcement Learning With Dual Expectile-quantile Regression (2023)0.00
- Fully Parameterized Quantile Function For Distributional Reinforcement Learning (2019)0.00
- A Differential Perspective On Distributional Reinforcement Learning (2025)0.00
- Value-distributional Model-based Reinforcement Learning (2023)1.56
- A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning (2024)0.00
- Non-decreasing Quantile Function Network With Efficient Exploration For Distributional Reinforcement Learning (2021)4.52
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00
- A Distributional Perspective On Reinforcement Learning (2017)0.00