Loss Dynamics Of Temporal Difference Reinforcement Learning
2023 Β· Blake Bordelon, Paul Masset, Henry Kuo, et al.
Abstract
Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent
Authors
(none)
Tags
Stats
Related papers
- Deterministic Limit Of Temporal Difference Reinforcement Learning For Stochastic Games (2018)12.93
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- Differential Temporal Difference Learning (2018)5.24
- Approximate Temporal Difference Learning Is A Gradient Descent For Reversible Policies (2018)0.00
- Learning Successor States And Goal-dependent Values: A Mathematical Viewpoint (2021)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- An MRP Formulation For Supervised Learning: Generalized Temporal Difference Learning Models (2024)0.00
- Robust And Adaptive Temporal-difference Learning Using An Ensemble Of Gaussian Processes (2021)0.00