A Differential Perspective On Distributional Reinforcement Learning
2025 Β· Juan Sebastian Rojas, Chi-Guhn Lee
Abstract
To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL to the average-reward setting, where an agent aims to optimize the reward received per time step. In particular, we utilize a quantile-based approach to develop the first set of algorithms that can successfully learn and/or optimize the long-run per-step reward distribution, as well as the differential return distribution of an average-reward MDP. We derive proven-convergent tabular algorithms for both prediction and control, as well as a broader family of algorithms that have appealing scaling properties. Empirically, we find that these algorithms yield competitive and sometimes superior performance when compared to their non-distributional equivalents, while also capturing rich information about the long-run per-step reward and differential return
Authors
(none)
Tags
Stats
Related papers
- A Comparative Analysis Of Expected And Distributional Reinforcement Learning (2019)9.76
- Distributional Reinforcement Learning With Quantile Regression (2017)19.20
- A Distributional Perspective On Reinforcement Learning (2017)0.00
- Distributional Reinforcement Learning For Multi-dimensional Reward Functions (2021)0.00
- The Nature Of Temporal Difference Errors In Multi-step Distributional Reinforcement Learning (2022)0.00
- Conjugated Discrete Distributions For Distributional Reinforcement Learning (2021)0.00
- Distributional Reinforcement Learning With Dual Expectile-quantile Regression (2023)0.00
- Normality-guided Distributional Reinforcement Learning For Continuous Control (2022)0.00