Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms
2019 Β· Matthia Sabatelli, Gilles Louppe, Pierre Geurts, et al.
Abstract
This paper makes one step forward towards characterizing a new family of \textit\{model-free\} Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (\(V\)), alongside an approximation of the state-action value function (\(Q\)). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN) \cite\{sabatelli2018deep\}. Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel \textit\{off-policy\} DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the \(V\) and
Authors
(none)
Tags
Stats
Related papers
- Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning (2022)0.00
- An Information-theoretic Optimality Principle For Deep Reinforcement Learning (2017)0.00
- Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-values (2024)0.00
- Uncertainty-aware Low-rank Q-matrix Estimation For Deep Reinforcement Learning (2021)0.00
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- Dissecting Deep RL With High Update Ratios: Combatting Value Divergence (2024)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Deep Radial-basis Value Functions For Continuous Control (2020)0.00