Statistical Efficiency Of Distributional Temporal Difference Learning And Freedman's Inequality In Hilbert Spaces
2024 Β· Yang Peng, Liangyu Zhang, Zhihua Zhang
Abstract
Distributional reinforcement learning (DRL) has achieved empirical success in various domains. One core task in DRL is distributional policy evaluation, which involves estimating the return distribution \(\eta^\pi\) for a given policy \(\pi\). Distributional temporal difference learning has been accordingly proposed, which extends the classic temporal difference learning (TD) in RL. In this paper, we focus on the non-asymptotic statistical rates of distributional TD. To facilitate theoretical analysis, we propose non-parametric distributional TD (NTD). For a \(\gamma\)-discounted infinite-horizon tabular Markov decision process, we show that for NTD with a generative model, we need \(\tilde\{O\}(\epsilon^\{-2\}\mu_\{\min\}^\{-1\}(1-\gamma)^\{-3\})\) interactions with the environment to achieve an \(\epsilon\)-optimal estimator with high probability, when the estimation error is measured by the \(1\)-Wasserstein. This sample complexity bound is minimax optimal up to logarithmic factors.
Authors
(none)
Tags
Stats
Related papers
- Nonlinear Distributional Gradient Temporal-difference Learning (2018)0.00
- Discerning Temporal Difference Learning (2023)0.00
- The Nature Of Temporal Difference Errors In Multi-step Distributional Reinforcement Learning (2022)0.00
- Accelerated Distributional Temporal Difference Learning With Linear Function Approximation (2025)0.00
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00
- A Differential Perspective On Distributional Reinforcement Learning (2025)0.00
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- A Comparative Analysis Of Expected And Distributional Reinforcement Learning (2019)9.76