Target-based Temporal Difference Learning
2019 Β· Donghwan Lee, Niao He
Abstract
The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side. In this work, we introduce a new family of target-based temporal difference (TD) learning algorithms and provide theoretical analysis on their convergences. In contrast to the standard TD-learning, target-based TD algorithms maintain two separate learning parameters-the target variable and online variable. Particularly, we introduce three members in the family, called the averaging TD, double TD, and periodic TD, where the target variable is updated through an averaging, symmetric, or periodic fashion, mirroring those techniques used in deep Q-learning practice. We establish asymptotic convergence analyses for both averaging TD and double TD and a finite sample analysis for periodic TD. In addition, we also provide some simulation results showing potentially superior convergence of these target-based TD algorithms c
Authors
(none)
Tags
Stats
Related papers
- Simplifying Deep Temporal Difference Learning (2024)0.00
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00
- Control Theoretic Analysis Of Temporal Difference Learning (2021)0.00
- Preferential Temporal Difference Learning (2021)0.00
- Neural Temporal-difference And Q-learning Provably Converge To Global Optima (2019)7.81
- Gradient Temporal-difference Learning With Regularized Corrections (2020)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Periodic Q-learning (2020)0.00