Finite-sample Analysis For SARSA With Linear Function Approximation
2019 Β· Shaofeng Zou, Tengyu Xu, Yingbin Liang
Abstract
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ data, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically \cite\{perkins2003convergent,melo2008analysis\}. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35
- Reinforcement Learning With Unbiased Policy Evaluation And Linear Function Approximation (2022)0.00
- Non-asymptotic Analysis Of Biased Stochastic Approximation Scheme (2019)0.00
- Finite Time Analysis Of Linear Two-timescale Stochastic Approximation With Markovian Noise (2020)0.00
- Finite-sample Analysis Of Stochastic Approximation Using Smooth Convex Envelopes (2020)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- Asynchronous Stochastic Approximations With Asymptotically Biased Errors And Deep Multi-agent Learning (2018)0.00
- Accelerated And Instance-optimal Policy Evaluation With Linear Function Approximation (2021)0.00