Abstract

Inspired by Double Q-learning algorithm, the Double-DQN (DDQN) algorithm was originally proposed in order to address the overestimation issue in the original DQN algorithm. The DDQN has successfully shown both theoretically and empirically the importance of decoupling in terms of action evaluation and selection in computation of target values; although, all the benefits were acquired with only a simple adaption to DQN algorithm, minimal possible change as it was mentioned by the authors. Nevertheless, there seems a roll-back in the proposed algorithm of DDQN since the parameters of policy network are emerged again in the target value function which were initially withdrawn by DQN with the hope of tackling the serious issue of moving targets and the instability caused by it (i.e., by moving targets) in the process of learning. Therefore, in this paper three modifications to the DDQN algorithm are proposed with the hope of maintaining the performance in the terms of both stability and ov

Authors

(none)

Tags

  • Uncategorized

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyhalat2021modified

Related papers