A Distributional Analysis Of Sampling-based Reinforcement Learning Algorithms
2020 Β· Philip Amortila, Doina Precup, Prakash Panangaden, et al.
Abstract
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(\(\lambda\)) and \(Q\)-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.
Authors
(none)
Tags
Stats
Related papers
- Constant Stepsize Q-learning: Distributional Convergence, Bias And Extrapolation (2024)0.00
- Conjugated Discrete Distributions For Distributional Reinforcement Learning (2021)0.00
- A Distributional Perspective On Reinforcement Learning (2017)0.00
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- A Differential Perspective On Distributional Reinforcement Learning (2025)0.00
- A Comparative Analysis Of Expected And Distributional Reinforcement Learning (2019)9.76
- Multi-step Reinforcement Learning: A Unifying Algorithm (2017)12.68