Nonlinear Distributional Gradient Temporal-difference Learning
2018 Β· Chao Qu, Shie Mannor, Huan Xu
Abstract
We devise a distributional variant of gradient temporal-difference (TD) learning. Distributional reinforcement learning has been demonstrated to outperform the regular one in the recent study \citep\{bellemare2017distributional\}. In the policy evaluation setting, we design two new algorithms called distributional GTD2 and distributional TDC using the Cram\{\'e\}r distance on the distributional version of the Bellman error objective function, which inherits advantages of both the nonlinear gradient TD algorithms and the distributional RL approach. In the control setting, we propose the distributional Greedy-GQ using the similar derivation. We prove the asymptotic almost-sure convergence of distributional GTD2 and TDC to a local optimal solution for general smooth function approximators, which includes neural networks that have been widely used in recent study to solve the real-life RL problems. In each step, the computational complexities of above three algorithms are linear w.r.t.\ th
Authors
(none)
Tags
Stats
Related papers
- Statistical Efficiency Of Distributional Temporal Difference Learning And Freedman's Inequality In Hilbert Spaces (2024)0.00
- Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning With Polynomial Sample Complexity (2020)5.84
- Bayesian Distributional Policy Gradients (2021)0.00
- New Versions Of Gradient Temporal Difference Learning (2021)0.00
- Accelerated Distributional Temporal Difference Learning With Linear Function Approximation (2025)0.00
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- Gradient Temporal-difference Learning With Regularized Corrections (2020)0.00
- Backstepping Temporal Difference Learning (2023)0.00