Policy Evaluation And Temporal-difference Learning In Continuous Time And Space: A Martingale Approach
2021 Β· Yanwei Jia, Xun Yu Zhou
Abstract
We propose a unified framework to study policy evaluation (PE) and the associated temporal difference (TD) methods for reinforcement learning in continuous time and space. We show that PE is equivalent to maintaining the martingale condition of a process. From this perspective, we find that the mean--square TD error approximates the quadratic variation of the martingale and thus is not a suitable objective for PE. We present two methods to use the martingale characterization for designing PE algorithms. The first one minimizes a "martingale loss function", whose solution is proved to be the best approximation of the true value function in the mean--square sense. This method interprets the classical gradient Monte-Carlo algorithm. The second method is based on a system of equations called the "martingale orthogonality conditions" with test functions. Solving these equations in different ways recovers various classical TD algorithms, such as TD(\(\lambda\)), LSTD, and GTD. Different choi
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Uncertainty Quantification For Markov Chain Induced Martingales With Application To Temporal Difference Learning (2025)0.00
- Temporal Difference Learning With Continuous Time And State In The Stochastic Setting (2022)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- On Generalized Bellman Equations And Temporal-difference Learning (2017)5.84
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00