Q-learning For Mdps With General Spaces: Convergence And Near Optimality Via Quantization Under Weak Continuity
2021 · Ali Devran Kara, Naci Saldi, Serdar Yüksel
Abstract
Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (
Authors
(none)
Tags
Stats
Related papers
- Minimax Optimal Q Learning With Nearest Neighbors (2023)8.09
- Convergence Of Finite Memory Q-learning For Pomdps And Near Optimality Of Learned Policies Under Filter Stability (2021)0.00
- Projection By Convolution: Optimal Sample Complexity For Reinforcement Learning In Continuous-space Mdps (2024)0.00
- Universal Approximation Theorem Of Deep Q-networks (2025)0.00
- Confident Natural Policy Gradient For Local Planning In \(q_\pi\)-realizable Constrained Mdps (2024)0.00
- How To Discretize Continuous State-action Spaces In Q-learning: A Symbolic Control Approach (2024)3.58
- Efficient Learning For Entropy-regularized Markov Decision Processes Via Multilevel Monte Carlo (2025)0.00
- Adaptive Discretization For Episodic Reinforcement Learning In Metric Spaces (2019)2.26