A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation
2023 Β· Heyang Zhao, Jiafan He, Quanquan Gu
Abstract
The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of \(\tilde\{O\}(d\sqrt\{HK\})\) when \(K\) is sufficiently large and near-optimal policy switching cost of \(\tilde\{O\}(dH)\), with \(d\) being the eluder dimension of the function class, \(H\) being the planning horizon, and \(K\) being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learnin
Authors
(none)
Tags
Stats
Related papers
- Uniform-pac Bounds For Reinforcement Learning With Linear Function Approximation (2021)0.00
- Online Sub-sampling For Reinforcement Learning With General Function Approximation (2021)0.00
- Randomized Exploration For Reinforcement Learning With Multinomial Logistic Function Approximation (2024)0.00
- Smart Exploration In Reinforcement Learning Using Bounded Uncertainty Models (2025)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Provably Efficient And Agile Randomized Q-learning (2025)0.00
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00