Learning The Model While Learning Q: Finite-time Sample Complexity Of Online Syncmbq

Abstract

Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, \(Q\)-learning has proven to be a powerful algorithm in model-free settings. However, the extension of \(Q\)-learning to a model-based framework remains relatively unexplored. In this paper, we investigate the sample complexity of \(Q\)-learning when integrated with a model-based approach. The proposed algorihtms learns both the model and Q-value in an online manner. We demonstrate a near-optimal sample complexity result within a broad range of step sizes.

Learning The Model While Learning Q: Finite-time Sample Complexity Of Online Syncmbq

Abstract

Authors

Tags

Stats

Related papers