Finite-time Analysis Of Simultaneous Double Q-learning
2024 Β· Hyunjun Na, Donghwan Lee
Abstract
\(Q\)-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the \(Q\)-learning update. To address this issue, double \(Q\)-learning employs two independent \(Q\)-estimators which are randomly selected and updated during the learning process. This paper proposes a modified double \(Q\)-learning, called simultaneous double \(Q\)-learning (SDQ), with its finite-time analysis. SDQ eliminates the need for random selection between the two \(Q\)-estimators, and this modification allows us to analyze double \(Q\)-learning through the lens of a novel switching system framework facilitating efficient finite-time analysis. Empirical studies demonstrate that SDQ converges faster than double \(Q\)-learning while retaining the ability to mitigate the maximization bias. Finally, we derive a finite-time expected error bound for SDQ.
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis For Double Q-learning (2020)0.00
- Simultaneous Double Q-learning With Conservative Advantage Learning For Actor-critic Methods (2022)0.00
- On The Estimation Bias In Double Q-learning (2021)0.00
- The Mean-squared Error Of Double Q-learning (2020)0.00
- Finite-time Error Analysis Of Soft Q-learning: Switching System Approach (2024)0.00
- Double Q(\(\sigma\)) And Q(\(\sigma, \lambda\)): Unifying Reinforcement Learning Control Algorithms (2017)0.00
- A Discrete-time Switching System Analysis Of Q-learning (2021)8.35
- Action Candidate Based Clipped Double Q-learning For Discrete And Continuous Action Tasks (2021)0.00