An Information-theoretic Optimality Principle For Deep Reinforcement Learning
2017 Β· Felix Leibfried, Jordi Grau-Moya, Haitham Bou-Ammar
Abstract
We methodologically address the problem of Q-value overestimation in deep reinforcement learning to handle high-dimensional state spaces efficiently. By adapting concepts from information theory, we introduce an intrinsic penalty signal encouraging reduced Q-value estimates. The resultant algorithm encompasses a wide range of learning outcomes containing deep Q-networks as a special case. Different learning outcomes can be demonstrated by tuning a Lagrange multiplier accordingly. We furthermore propose a novel scheduling scheme for this Lagrange multiplier to ensure efficient and robust learning. In experiments on Atari, our algorithm outperforms other algorithms (e.g. deep and double deep Q-networks) in terms of both game-play performance and sample complexity. These results remain valid under the recently proposed dueling architecture.
Authors
(none)
Tags
Stats
Related papers
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- On The Estimation Bias In Double Q-learning (2021)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- Reducing Variance In Temporal-difference Value Estimation Via Ensemble Of Deep Networks (2022)0.00
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Expert Q-learning: Deep Reinforcement Learning With Coarse State Values From Offline Expert Examples (2021)3.58