Abstract

This paper makes one step forward towards characterizing a new family of \textit\{model-free\} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (\(V\)), alongside an approximation of the state-action value function (\(Q\)). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite\{sabatelli2018deep\}. Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel \textit\{off-policy\} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the \(V\) and

Authors

(none)

Tags

  • Uncategorized

Stats

Related papers