The Surprising Efficiency Of Temporal Difference Learning For Rare Event Prediction
2024 Β· Xiaoou Cheng, Jonathan Weare
Abstract
We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement learning, with an emphasis on estimation of quantities related to rare events. Policy evaluation is complicated in the rare event setting by the long timescale of the event and by the need for *relative accuracy* in estimates of very small values. Specifically, we focus on least-squares TD (LSTD) prediction for finite state Markov chains, and show that LSTD can achieve relative accuracy far more efficiently than MC. We prove a central limit theorem for the LSTD estimator and upper bound the *relative asymptotic variance* by simple quantities characterizing the connectivity of states relative to the transition probabilities between them. Using this bound, we show that, even when both the timescale of the rare event and the relative accuracy of the MC estimator are exponentially large in the number of states, LSTD maintains a fixed level o
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- Discerning Temporal Difference Learning (2023)0.00
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Reanalysis Of Variance Reduced Temporal Difference Learning (2020)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00