Qualitative Measurements Of Policy Discrepancy For Return-based Deep Q-network
2018 Β· Wenjia Meng, Qian Zheng, Long Yang, et al.
Abstract
The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show the performance of traditional DQN can be improved effectively by introducing return-based reinforcement learning. In order to further improve the R-DQN, we design a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give the two measurements' bounds in the proposed R-DQN framework. We show that algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments, conducted on several representative tasks from the OpenAI Gym library, validate the effectiveness of the p
Authors
(none)
Tags
Stats
Related papers
- A Theoretical Analysis Of Deep Q-learning (2019)0.00
- Approximating Gradients For Differentiable Quality Diversity In Reinforcement Learning (2022)0.00
- Quantile-based Deep Reinforcement Learning Using Two-timescale Policy Gradient Algorithms (2023)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- WD3: Taming The Estimation Bias In Deep Reinforcement Learning (2020)10.21
- A Deep Policy Inference Q-network For Multi-agent Systems (2017)0.00
- On The Convergence And Sample Complexity Analysis Of Deep Q-networks With \(\epsilon\)-greedy Exploration (2023)3.58
- Beyond Expected Return: Accounting For Policy Reproducibility When Evaluating Reinforcement Learning Algorithms (2023)3.58