GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning
2021 Β· Jiajun Fan, Changnan Xiao, Yue Huang
Abstract
Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is be
Authors
(none)
Tags
Stats
Related papers
- On The Mistaken Assumption Of Interchangeable Deep Reinforcement Learning Implementations (2025)0.00
- Distributed Deep Reinforcement Learning: An Overview (2020)0.00
- A Practical Introduction To Deep Reinforcement Learning (2025)0.00
- A Survey Of Deep Reinforcement Learning In Video Games (2019)0.00
- Modern Deep Reinforcement Learning Algorithms (2019)0.00
- Rethinking Adversarial Attacks In Reinforcement Learning From Policy Distribution Perspective (2025)5.84
- When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning (2022)0.00
- Generalization And Regularization In DQN (2018)0.00