Finite-time Error Analysis Of Soft Q-learning: Switching System Approach
2024 Β· Narim Jeong, Donghwan Lee
Abstract
Soft Q-learning is a variation of Q-learning designed to solve entropy regularized Markov decision problems where an agent aims to maximize the entropy regularized value function. Despite its empirical success, there have been limited theoretical studies of soft Q-learning to date. This paper aims to offer a novel and unified finite-time, control-theoretic analysis of soft Q-learning algorithms. We focus on two types of soft Q-learning algorithms: one utilizing the log-sum-exp operator and the other employing the Boltzmann operator. By using dynamical switching system models, we derive novel finite-time error bounds for both soft Q-learning algorithms. We hope that our analysis will deepen the current understanding of soft Q-learning by establishing connections with switching system models and may even pave the way for new frameworks in the finite-time analysis of other reinforcement learning algorithms.
Authors
(none)
Tags
Stats
Related papers
- Finite-time Analysis Of Minimax Q-learning For Two-player Zero-sum Markov Games: Switching System Approach (2023)0.00
- A Discrete-time Switching System Analysis Of Q-learning (2021)8.35
- Finite-time Analysis Of Simultaneous Double Q-learning (2024)0.00
- Finite-time Analysis Of Asynchronous Q-learning Under Diminishing Step-size From Control-theoretic View (2022)3.58
- Equivalence Between Policy Gradients And Soft Q-learning (2017)0.00
- Finite-time Analysis For Double Q-learning (2020)0.00
- Direct Soft-policy Sampling Via Langevin Dynamics (2026)0.00
- Temporal-difference Value Estimation Via Uncertainty-guided Soft Updates (2021)0.00