Temporal-difference Learning With Nonlinear Function Approximation: Lazy Training And Mean Field Regimes
2019 Β· Andrea Agazzi, Jianfeng Lu
Abstract
We discuss the approximation of the value function for infinite-horizon discounted Markov Reward Processes (MRP) with nonlinear functions trained with the Temporal-Difference (TD) learning algorithm. We first consider this problem under a certain scaling of the approximating function, leading to a regime called lazy training. In this regime, the parameters of the model vary only slightly during the learning process, a feature that has recently been observed in the training of neural networks, where the scaling we study arises naturally, implicit in the initialization of their parameters. Both in the under- and over-parametrized frameworks, we prove exponential convergence to local, respectively global minimizers of the above algorithm in the lazy training regime. We then compare this scaling of the parameters to the mean-field regime, where the approximately linear behavior of the model is lost. Under this alternative scaling we prove that all fixed points of the dynamics in parameter
Authors
(none)
Tags
Stats
Related papers
- A Finite Time Analysis Of Temporal Difference Learning With Linear Function Approximation (2018)0.00
- Adaptive Temporal Difference Learning With Linear Function Approximation (2020)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- Accelerated Distributional Temporal Difference Learning With Linear Function Approximation (2025)0.00
- Differential Temporal Difference Learning (2018)5.24
- Single-timescale Stochastic Nonconvex-concave Optimization For Smooth Nonlinear TD Learning (2020)0.00
- Geometric Insights Into The Convergence Of Nonlinear TD Learning (2019)0.00