Abstract

Deep Q-Networks algorithm (DQN) was the first reinforcement learning algorithm using deep neural network to successfully surpass human level performance in a number of Atari learning environments. However, divergent and unstable behaviour have been long standing issues in DQNs. The unstable behaviour is often characterised by overestimation in the \(Q\)-values, commonly referred to as the overestimation bias. To address the overestimation bias and the divergent behaviour, a number of heuristic extensions have been proposed. Notably, multi-step updates have been shown to drastically reduce unstable behaviour while improving agent's training performance. However, agents are often highly sensitive to the selection of the multi-step update horizon (\(n\)), and our empirical experiments show that a poorly chosen static value for \(n\) can in many cases lead to worse performance than single-step DQN. Inspired by the success of \(n\)-step DQN and the effects that multi-step updates have on ov

Authors

(none)

Tags

  • Multi-Agent

Stats

  • citations27
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score10.85
  • arxiv keyly2022elastic

Related papers