The Role Of Target Update Frequencies In Q-learning
2026 Β· Simon Weissmann, Tilman Aach, Benedikt Wille, et al.
Abstract
The target network update frequency (TUF) is a central stabilization mechanism in (deep) Q-learning. However, their selection remains poorly understood and is often treated merely as another tunable hyperparameter rather than as a principled design decision. This work provides a theoretical analysis of target fixing in tabular Q-learning through the lens of approximate dynamic programming. We formulate periodic target updates as a nested optimization scheme in which each outer iteration applies an inexact Bellman optimality operator, approximated by a generic inner loop optimizer. Rigorous theory yields a finite-time convergence analysis for the asynchronous sampling setting, specializing to stochastic gradient descent in the inner loop. Our results deliver an explicit characterization of the bias-variance trade-off induced by the target update period, showing how to optimally set this critical hyperparameter. We prove that constant target update schedules are suboptimal, incurring a l
Authors
(none)
Tags
Stats
Related papers
- Periodic Q-learning (2020)0.00
- Target-based Temporal Difference Learning (2019)0.00
- Breaking The Deadly Triad With A Target Network (2021)0.00
- T-soft Update Of Target Network For Deep Reinforcement Learning (2020)13.39
- A Unifying View Of Linear Function Approximation In Off-policy RL Through Matrix Splitting And Preconditioning (2025)0.00
- Qf-tuner: Breaking Tradition In Reinforcement Learning (2024)0.00
- Deep Q-learning: A Robust Control Approach (2022)9.23
- Temporal-difference Value Estimation Via Uncertainty-guided Soft Updates (2021)0.00