On A Convergent Off -policy Temporal Difference Learning Algorithm In On-line Learning Environment
2016 Β· Prasenjit Karmakar, Rajkumar Maity, Shalabh Bhatnagar
Abstract
In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment. The algorithm considered here is TDC with importance weighting introduced by Maei et al. We support our theoretical results by providing suitable empirical results for standard off-policy counterexamples.
Authors
(none)
Tags
Stats
Related papers
- Backstepping Temporal Difference Learning (2023)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Two Time-scale Off-policy TD Learning: Non-asymptotic Analysis Over Markovian Samples (2019)0.00
- Variance-reduced Off-policy TDC Learning: Non-asymptotic Convergence Analysis (2020)3.58
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- O\(^2\)TD: (near)-optimal Off-policy TD Learning (2017)0.00
- Analysis Of Off-policy \(n\)-step Td-learning With Linear Function Approximation (2025)0.00