Adaptive Lambda Least-squares Temporal Difference Learning
2016 Β· Timothy A. Mann, Hugo Penedones, Shie Mannor, et al.
Abstract
Temporal Difference learning or TD(\(\lambda\)) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's \(\lambda\) parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the \(\lambda\) selection problem as a bias-variance trade-off where the solution is the value of \(\lambda\) that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of \(\lambda\) values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune \(\lambda\) and apply function optimization methods to efficiently search the space of \(\lambda\) values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally
Authors
(none)
Tags
Stats
Related papers
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Finite Sample Analysis Of Linear Temporal Difference Learning With Arbitrary Features (2025)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- A Greedy Approach To Adapting The Trace Parameter For Temporal Difference Learning (2016)0.00
- Meta-learning Eligibility Traces For More Sample Efficient Temporal Difference Learning (2020)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Gradient Iterated Temporal-difference Learning (2026)0.00