Suppressing Overestimation In Q-learning Through Adversarial Behaviors
2023 Β· Hyeann Lee, Donghwan Lee
Abstract
The goal of this paper is to propose a new Q-learning algorithm with a dummy adversarial player, which is called dummy adversarial Q-learning (DAQ), that can effectively regulate the overestimation bias in standard Q-learning. With the dummy player, the learning can be formulated as a two-player zero-sum game. The proposed DAQ unifies several Q-learning variations to control overestimation biases, such as maxmin Q-learning and minmax Q-learning (proposed in this paper) in a single framework. The proposed DAQ is a simple but effective way to suppress the overestimation bias thourgh dummy adversarial behaviors and can be easily applied to off-the-shelf reinforcement learning algorithms to improve the performances. A finite-time convergence of DAQ is analyzed from an integrated perspective by adapting an adversarial Q-learning. The performance of the suggested DAQ is empirically demonstrated under various benchmark environments.
Authors
(none)
Tags
Stats
Related papers
- On The Estimation Bias In Double Q-learning (2021)0.00
- Automating Control Of Overestimation Bias For Reinforcement Learning (2021)0.00
- On The Reduction Of Variance And Overestimation Of Deep Q-learning (2019)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- Parameter-free Reduction Of The Estimation Bias In Deep Reinforcement Learning For Deterministic Policy Gradients (2021)0.00
- Estimation Error Correction In Deep Reinforcement Learning For Deterministic Actor-critic Methods (2021)7.16
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00