Abstract

Q-learning has long been one of the most popular reinforcement learning algorithms, and theoretical analysis of Q-learning has been an active research topic for decades. Although researches on asymptotic convergence analysis of Q-learning have a long tradition, non-asymptotic convergence has only recently come under active study. The main goal of this paper is to investigate new finite-time analysis of asynchronous Q-learning under Markovian observation models via a control system viewpoint. In particular, we introduce a discrete-time time-varying switching system model of Q-learning with diminishing step-sizes for our analysis, which significantly improves recent development of the switching system analysis with constant step-sizes, and leads to \(\mathcal\{O\}\left( \sqrt\{\frac\{log k\}\{k\}\} \right)\) convergence rate that is comparable to or better than most of the state of the art results in the literature. In the mean while, a technique using the similarly transformation is new

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations2
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score3.58
  • arxiv keylim2022finite

Related papers