Stabilizing Q-learning With Linear Architectures For Provably Efficient Learning
2022 Β· Andrea Zanette, Martin J. Wainwright
Abstract
The \(Q\)-learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic \(Q\)-learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm, namely a space complexity independent of the number of step elapsed. We show that the performance of
Authors
(none)
Tags
Stats
Related papers
- Regularized Q-learning (2022)0.00
- Q-learning As A Monotone Scheme (2024)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00
- Provably Efficient \(q\)-learning With Function Approximation Via Distribution Shift Error Checking Oracle (2019)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Nonstationary Reinforcement Learning With Linear Function Approximation (2020)0.00
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00