The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation
2023 Β· Mark Rowland, Yunhao Tang, Clare Lyle, et al.
Abstract
We study the problem of temporal-difference-based policy evaluation in reinforcement learning. In particular, we analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD), for this task. We reach the surprising conclusion that even if a practitioner has no interest in the return distribution beyond the mean, QTD (which learns predictions about the full distribution of returns) may offer performance superior to approaches such as classical TD learning, which predict only the mean return, even in the tabular setting.
Authors
(none)
Tags
Stats
Related papers
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Distributional Reinforcement Learning With Quantile Regression (2017)19.20
- Preferential Temporal Difference Learning (2021)0.00
- Temporal-difference Value Estimation Via Uncertainty-guided Soft Updates (2021)0.00
- Discerning Temporal Difference Learning (2023)0.00
- The Nature Of Temporal Difference Errors In Multi-step Distributional Reinforcement Learning (2022)0.00