Universal Approximation Theorem Of Deep Q-networks
2025 Β· Qian Qi
Abstract
We establish a continuous-time framework for analyzing Deep Q-Networks (DQNs) via stochastic control and Forward-Backward Stochastic Differential Equations (FBSDEs). Considering a continuous-time Markov Decision Process (MDP) driven by a square-integrable martingale, we analyze DQN approximation properties. We show that DQNs can approximate the optimal Q-function on compact sets with arbitrary accuracy and high probability, leveraging residual network approximation theorems and large deviation bounds for the state-action process. We then analyze the convergence of a general Q-learning algorithm for training DQNs in this setting, adapting stochastic approximation theorems. Our analysis emphasizes the interplay between DQN layer count, time discretization, and the role of viscosity solutions (primarily for the value function \(V^*\)) in addressing potential non-smoothness of the optimal Q-function. This work bridges deep reinforcement learning and stochastic control, offering insights in
Authors
(none)
Tags
Stats
Related papers
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- A Finite-time Analysis Of Q-learning With Neural Network Function Approximation (2019)0.00
- Deep Q-learning: A Robust Control Approach (2022)9.23
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- Convergent And Efficient Deep Q Network Algorithm (2021)0.00
- Q-learning For Mdps With General Spaces: Convergence And Near Optimality Via Quantization Under Weak Continuity (2021)0.00