Finite Sample Analysis Of The GTD Policy Evaluation Algorithms In Markov Setting
2018 Β· Yue Wang, Wei Chen, Yuting Liu, et al.
Abstract
In reinforcement learning (RL) , one of the key components is policy evaluation, which aims to estimate the value function (i.e., expected long-term accumulated reward) of a policy. With a good policy evaluation method, the RL algorithms will estimate the value function more accurately and find a better policy. When the state space is large or continuous *Gradient-based Temporal Difference(GTD)* policy evaluation algorithms with linear function approximation are widely used. Considering that the collection of the evaluation data is both time and reward consuming, a clear understanding of the finite sample performance of the policy evaluation algorithms is very important to reinforcement learning. Under the assumption that data are i.i.d. generated, previous work provided the finite sample analysis of the GTD algorithms with constant step size by converting them into convex-concave saddle point problems. However, it is well-known that, the data are generated from Markov processes rather
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Proximal Gradient TD Algorithms (2020)0.00
- Finite-sample Analysis Of Greedy-gq With Linear Function Approximation Under Markovian Noise (2020)0.00
- High-probability Sample Complexities For Policy Evaluation With Linear Function Approximation (2023)0.00
- Regularized Gradient Temporal-difference Learning (2026)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning (2021)9.23
- Non-asymptotic Convergence Of Adam-type Reinforcement Learning Algorithms Under Markovian Sampling (2020)0.00
- Policy Evaluation And Temporal-difference Learning In Continuous Time And Space: A Martingale Approach (2021)4.52