A Greedy Approach To Adapting The Trace Parameter For Temporal Difference Learning
2016 Β· Martha White, Adam White
Abstract
One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-rate parameter, online during learning. Such meta-learning approaches can improve robustness of learning and enable specialization to current task, improving learning speed. For temporal-difference learning algorithms which we study here, there is yet another parameter, \(\lambda\), that similarly impacts learning speed and stability in practice. Unfortunately, unlike the learning-rate parameter, \(\lambda\) parametrizes the objective function that temporal-difference methods optimize. Different choices of \(\lambda\) produce different fixed-point solutions, and thus adapting \(\lambda\) onlin
Authors
(none)
Tags
Stats
Related papers
- Adaptive Lambda Least-squares Temporal Difference Learning (2016)0.00
- Meta-learning Eligibility Traces For More Sample Efficient Temporal Difference Learning (2020)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Directly Estimating The Variance Of The {\lambda}-return Using Temporal-difference Methods (2018)0.00
- On Generalized Bellman Equations And Temporal-difference Learning (2017)5.84
- Discerning Temporal Difference Learning (2023)0.00
- Revisiting A Design Choice In Gradient Temporal Difference Learning (2023)0.00
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00