Periodic Q-learning
2020 Β· Donghwan Lee, Niao He
Abstract
The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the so-called periodic Q-learning algorithm (PQ-learning for short), which resembles the technique used in deep Q-learning for solving infinite-horizon discounted Markov decision processes (DMDP) in the tabular setting. PQ-learning maintains two separate Q-value estimates - the online estimate and target estimate. The online estimate follows the standard Q-learning update, while the target estimate is updated periodically. In contrast to the standard Q-learning, PQ-learning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilon-optimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Q-learning algorithms.
Authors
(none)
Tags
Stats
Related papers
- Target-based Temporal Difference Learning (2019)0.00
- The Role Of Target Update Frequencies In Q-learning (2026)0.00
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Breaking The Deadly Triad With A Target Network (2021)0.00
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34
- Stabilizing Q-learning With Linear Architectures For Provably Efficient Learning (2022)0.00
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- Iterated \(q\)-network: Beyond One-step Bellman Updates In Deep Reinforcement Learning (2024)0.00