Provably Efficient And Agile Randomized Q-learning
2025 Β· He Wang, Xingyu Xu, Yuejie Chi
Abstract
While Bayesian-based exploration often demonstrates superior empirical performance compared to bonus-based methods in model-based reinforcement learning (RL), its theoretical understanding remains limited for model-free settings. Existing provable algorithms either suffer from computational intractability or rely on stage-wise policy updates which reduce responsiveness and slow down the learning process. In this paper, we propose a novel variant of Q-learning algorithm, refereed to as RandomizedQ, which integrates sampling-based exploration with agile, step-wise, policy updates, for episodic tabular RL. We establish an \(\widetilde\{O\}(\sqrt\{H^5SAT\})\) regret bound, where \(S\) is the number of states, \(A\) is the number of actions, \(H\) is the episode length, and \(T\) is the total number of episodes. In addition, we present a logarithmic regret bound under a mild positive sub-optimality condition on the optimal Q-function. Empirically, RandomizedQ exhibits outstanding performanc
Authors
(none)
Tags
Stats
Related papers
- Q-learning With UCB Exploration Is Sample Efficient For Infinite-horizon MDP (2019)0.00
- Q-learning With Posterior Sampling (2025)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- On Optimistic Versus Randomized Exploration In Reinforcement Learning (2017)0.00
- Minimax-optimal Reward-agnostic Exploration In Reinforcement Learning (2023)0.00
- When Simple Exploration Is Sample Efficient: Identifying Sufficient Conditions For Random Exploration To Yield PAC RL Algorithms (2018)0.00