Does DQN Learn?
2022 Β· Aditya Gopalan, Gugan Thoppe
Abstract
A primary requirement for any reinforcement learning method is that it should produce policies that improve upon the initial guess. In this work, we show that the widely used Deep Q-Network (DQN) fails to satisfy this minimal criterion -- even when it gets to see all possible states and actions infinitely often (a condition under which tabular Q-learning is guaranteed to converge to the optimal Q-value function). Our specific contributions are twofold. First, we numerically show that DQN often returns a policy that performs worse than the initial one. Second, we offer a theoretical explanation for this phenomenon in linear DQN, a simplified version of DQN that uses linear function approximation in place of neural networks while retaining the other key components such as \(\epsilon\)-greedy exploration, experience replay, and target network. Using tools from differential inclusion theory, we prove that the limit points of linear DQN correspond to fixed points of projected Bellman operat
Authors
(none)
Tags
Stats
Related papers
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- Convergent And Efficient Deep Q Network Algorithm (2021)0.00
- Deep Q-learning: Theoretical Insights From An Asymptotic Analysis (2020)10.35
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- \(\beta\)-dqn: Improving Deep Q-learning By Evolving The Behavior (2025)0.00
- DQN With Model-based Exploration: Efficient Learning On Environments With Sparse Rewards (2019)0.00
- Generalization And Regularization In DQN (2018)0.00
- Online Target Q-learning With Reverse Experience Replay: Efficiently Finding The Optimal Policy For Linear Mdps (2021)0.00