Abstract

The linear regression (LR) method offers the advantage that optimal parameters can be calculated relatively easily, although its representation capability is limited than that of the deep learning technique. To improve deep reinforcement learning, the Least Squares Deep Q Network (LS-DQN) method was proposed by Levine et al., which combines Deep Q Network (DQN) with LR method. However, the LS-DQN method assumes that the actions are discrete. In this study, we propose the Double Least Squares Deep Deterministic Policy Gradient (DLS-DDPG) method to address this limitation. This method combines the LR method with the Deep Deterministic Policy Gradient (DDPG) technique, one of the representative deep reinforcement learning algorithms for continuous action cases. For the LR update of the critic network, DLS-DDPG uses an algorithm similar to the Fitted Q iteration, the method which LS-DQN adopted. In addition, we calculated the optimal action using the quasi-Newton method and used it as both

Authors

(none)

Tags

  • Policy Gradient

Stats

Related papers