A Greedy Approach To Adapting The Trace Parameter For Temporal Difference Learning

Abstract

One of the main obstacles to broad application of reinforcement learning methods is the parameter sensitivity of our core learning algorithms. In many large-scale applications, online computation and function approximation represent key strategies in scaling up reinforcement learning algorithms. In this setting, we have effective and reasonably well understood algorithms for adapting the learning-rate parameter, online during learning. Such meta-learning approaches can improve robustness of learning and enable specialization to current task, improving learning speed. For temporal-difference learning algorithms which we study here, there is yet another parameter, \(\lambda\), that similarly impacts learning speed and stability in practice. Unfortunately, unlike the learning-rate parameter, \(\lambda\) parametrizes the objective function that temporal-difference methods optimize. Different choices of \(\lambda\) produce different fixed-point solutions, and thus adapting \(\lambda\) onlin

A Greedy Approach To Adapting The Trace Parameter For Temporal Difference Learning

Abstract

Authors

Tags

Stats

Related papers