Soft Q Network
2019 Β· Jingbin Liu, Shuai Liu, Xinyang Gu
Abstract
Deep Q Network (DQN) is a very successful algorithm, yet the inherent problem of reinforcement learning, i.e. the exploit-explore balance, remains. In this work, we introduce entropy regularization into DQN and propose SQN. We find that the backup equation of soft Q learning can enjoy the corrective feedback if we view the soft backup as policy improvement in the form of Q, instead of policy evaluation. We show that Soft Q Learning with Corrective Feedback (SQL-CF) underlies the on-plicy nature of SQL and the equivalence of SQL and Soft Policy Gradient (SPG). With these insights, we propose an on-policy version of deep Q learning algorithm, i.e. Q On-Policy (QOP). We experiment with QOP on a self-play environment called Google Research Football (GRF). The QOP algorithm exhibits great stability and efficiency in training GRF agents.
Authors
(none)
Tags
Stats
Related papers
- Equivalence Between Policy Gradients And Soft Q-learning (2017)0.00
- Intervention-assisted Policy Gradient Methods For Online Stochastic Queuing Network Optimization: Technical Report (2024)0.00
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- Human-level Control Through Directly-trained Deep Spiking Q-networks (2021)12.40
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Enhancing Q-value Updates In Deep Q-learning Via Successor-state Prediction (2025)0.00
- Deep Reinforcement Learning With Spiking Q-learning (2022)0.00
- Direct Soft-policy Sampling Via Langevin Dynamics (2026)0.00